﻿"Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] A security system attacks possibly: detection, recovery, repair Security deals with actions incidental actions: safety security i) actions (from victim point of view); need not be illegal implies the existence of an , targeting thinking of modeling attacker capabilities is essential inel, multiple, colluding attackers By knowing tehnical details (operating systems, networks, programming, crypto) By knowing tehnical details (operating systems, networks, programming, crypto) By thinking [v Schneier] like an attacker (technical and social aspects) social engineering: e g , impersonate maintenance to get access By knowing tehnical details (operating systems, networks, programming, crypto) By thinking [v Schneier] like an attacker (technical and social aspects) social engineering: e g , impersonate maintenance to get access By understanding: fundamental notions: what needs protected? how? from what attacks? principles (design construction): general, not necessarily technical [ В Schneier, Beyond Fear] What are you trying to protect? What are the to those assets? How well does the solution those risks? What does the solution cause? What does the solution impose? - protecting   hiding information or resources - typically done through cryptography - or other undisclosed mechanisms - not just , even may be confidential (cf steganography) - includes hiding the resources = trust in data or resources - expressed by preventing unauthorized modifications We distinguish: - data integrity (of content) - data origin authentication integrity mechanisms - prevention mechanisms of unauthorized data manipulation (e g from outside) of data manipulation in unauthorized ways (e g from inside) - detection mechanisms [M Bishop: Computer Security: Art and Science, Pearson, 2003] = the ability of using information or a resource in the desired way A system which is not available can be worse than one nonexistent Availability is usually analyzed in the context of some (statistical) assumptions about the environment if the assumptions are not satisfied, the system may be compromised denial of service attacks - may be difficult to detect if the traffic (partially) matches the allowed statistic pattern Privacy, Availability-Authentication, integrity, Non-repudiation Parkerian Hexad (Donn Parker, 2002) confidentiality (important even without violating confidentiality) integrity (of origin or author) availability (ex data converted to useless format disponibilitate) [Handbook of Applied Cryptography] signature authorization access control timestamping wiinessing (by someone other than originator) confirmation anonymity revocation traceability   accountability Confidentiality, integrity, availability are We discuss (potential) and (real) offered to those Services Threat classification [R Shirey, cf M Bishop] - disclosure - deception (forcing acceptance of false data) - disruption = interrupting   stopping normal service - usurpation = unauthorized control of part of a system Microsoft STRiDE threat model poofing identity - impersonating ampering with data - falsifying   attack on integrity epudiation - negating the effect of an action nformation disclosure - attack to confidentiality enial of service - attack to availability levation of privilege - unauthorized additional rights interception (snooping) in particular: (passive) wiretapping modifying   altering data => deception also interruption   usurpation (gaining control) active wiretapping, man-in-the-middle attack (actively changing content) impersonation (masquerading, spoofing) repudiation of origin (e g in commercial transactions) denial of receipt - a form of deception delay - could be service interruption, also usurpation denial of service а) : кеер the design as simple and small as possible => security by design, not as an afterthought а) : кеер the design as simple and small as possible => security by design, not as an afterthought b) : base access decisions based on permission rather than exclusion (default deny) а) : кеер the design as simple and small as possible => security by design, not as an afterthought b) : base access decisions based on permission rather than exclusion (default deny) c) : check every access, every time (including in exceptional cases, maintenance ) NOT based on previously taken decisions а) : кеер the design as simple and small as possible => security by design, not as an afterthought b) : base access decisions based on permission rather than exclusion (default deny) c) : check every access, every time (including in exceptional cases, maintenance ) NOT based on previously taken decisions d) : (NOT: security through obscurity) => mechanisms may be publicly checked to gain trust e) : separation increases robusiness e) : separation increases robusiness f) : every program and user should operate with the minimal set of privileges needed for the given task e) : separation increases robusiness f) : every program and user should operate with the minimal set of privileges needed for the given task g) : minimize common resources, interference among users, the mechanisms on which everything is based e) : separation increases robusiness f) : every program and user should operate with the minimal set of privileges needed for the given task g) : minimize common resources, interference among users, the mechanisms on which everything is based h) not unduly interfere with common activity if mechanisms are not simple, they will be misused or bypassed : separation increases robusiness e) f) : every program and user should operate with the minimal set of privileges needed for the given task g) : minimize common resources, interference among users, the mechanisms on which everything is based h) not unduly interfere with common activity if mechanisms are not simple, they will be misused or bypassed 2 additional ones: Work factor, compare needed effort with attacker resources Compromise recording- in case of failure, an alarm still useful weakest link determines security of entire system adequate protection principie not maximal security, but utility at acceptable risk cost principie of efficiency (cf acceptability) appropriate, easy to use correctly defense in depth: layered protection [Ninghui Li, CS 426: Computer Security, course, Purdue University] - "probe": acces a target to determine characteristics - "scan": sytematically access (probe) several targets - "flood": repeated access to a target to overload it - authentication: present an identity for verification and ulterior access - bypass: circumvent a control authorization process using an alternate method to access a target - spoof masquerade: assume some other identity - read - сору - steal (take into posession and eliminate the original) - modify - delete unauthorized (increased) access to a system or network information disclosure (attack to confidentiality) information corruption (atac la integritate) denial of service (attack to availability) theft of resources (unauthorized use): a type of usurping resource error modes: passive vs active (does not vs does what it shoudln’t) danger of errors in rare cases security imbalances - effect of large-scale technologies fragile (brittle) Systems vs resilient to errors protection methods: adaptive to unforeseen situations monocultures (homogeneous Systems) - vulnerable to same attack e g majority of Systems is running Windows security is a human & social problem in security, we make (statements) of various entities These statements are not absolute, they are based on assumptions => Security is a matter of trust: in whom what can we trust? Ken Thompson: Reflections on Trusting Trust (Turing Award Lecture '83) inserted a trojan into the login program and C compiler to accept a special password (known by originator) by using self-reproducing code "You can’t trust code that you did not create yourself" "No amount of source-level verification or scrutiny will prevent you from using untrusted code" every file is owned by a user and group individual permission bits: read, write, execute search 3 groups of bits for: user, group, others Meaning for directories is more complex than for files: r is needed for readO, readdirO, opendirO => for is x ("search") is needed for chdirO and stat() (any file) What permissions are needed to read a file ? What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? w in parent directory, as well as x Need not have w for the file! What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? w in parent directory, as well as x Need not have w for the file! What can you do with x on directory but not r ? What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? w in parent directory, as well as x Need not have w for the file! What can you do with x on directory but not r ? You can access a file with known name, but can’t search for a file (e g search for file on a web server) What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? w in parent directory, as well as x Need not have w for the file! What can you do with x on directory but not r ? You can access a file with known name, but can’t search for a file (e g search for file on a web server) Special bits: - sticky bit: for directory: file can only be deleted by owner - set user iD: execute with efFective iD of file owner - set group iD: execute with efFective iD of file group A is а statement of what is, and what is not, allowed A is a method, tool or procedure for a security policy Bishop, Computer Security: Art and Science we need to check if the mechanism is correct A mechanism may be: - safe (does not allow States disallowed by the policy) - precise (allows exact y what the policy specifies) - broad (allows more than the policy does) a mechanism to allow or deny an entity’s access to a resource "principal" subject —> request —> guard monitor —> object Access control consists of two steps: : Who made the access request ? : Does subject s have access rights for resource o 7 We distinguish: - a set of subjects or principals S - a set of objects 0 - a set of access modes A Simplest: A = {observe, alter} Usually not enough The Bell-LaPadula model refines this to: A = {execute, read, append, write} When are distinctions between these modes useful ? We distinguish: - a set of subjects or principals S - a set of objects 0 - a set of access modes A Simplest: A = {observe, alter} Usually not enough The Bell-LaPadula model refines this to: A = {execute, read, append, write} When are distinctions between these modes useful ? log: append, without changing prior contents execute encryption, without knowing the key A process has (in most newer versions) three user-related identifiers: - real user iD: (initial) owner of the process - efFective user iD: determines access rights - saved user iD: used to revert to a previous UiD Normally: ruid = euid = user launching the process Exception: euid = owner of the loaded executable, when it has the s (setuid) bit set => running with other privileges (e g elevated) (similar for group identifiers) Ql: Why do we need functions to manipulate UiDs at runtime? A process has (in most newer versions) three user-related identifiers: - real user iD: (initial) owner of the process - efFective user iD: determines access rights - saved user iD: used to revert to a previous UiD Normally: ruid = euid = user launching the process Exception: euid = owner of the loaded executable, when it has the s (setuid) bit set => running with other privileges (e g elevated) (similar for group identifiers) Ql: Why do we need functions to manipulate UiDs at runtime? Q2: Why is saving the old UiD not left to the programmer ? calls setuid(val) - if euid = O (root), set ruid=euid=val (and saved uid too) => UiDs   privi leges are set - else (euid ф 0): can only set euid = val if val is real or saved uid ruid and saved uid unchanged Q3: what are the limitations if only this caii exists? calls setuid(val) - if euid = O (root), set ruid=euid=val (and saved uid too) => UiDs   privi leges are set - else (euid ф 0): сап only set euid = val if val is real or saved uid ruid and saved uid unchanged Q3: what are the limitations if only this caii exists? seteuid(val) allowed only if euid == 0 or if val is one of the three values (euid ruid saved) sets on y euid, does not change ruid and saved uid changes are by another seteuid caii Marius Minea marius@cs upt ro 11 October 2017 Many high-level languages with scores of features Processor hardware: much simpler, less variation Compiler must bridge this gap general rules schemes of translation adapted to the processor architecture additionally: optimizations - make code short, fast What will the following print? = 5; printf( What will the following print? = 5; printf( , i++, i++, i++); Compiler warns: behavior is several side-effects (increments) on same object (i) (orderings between these side-effects and value computations unknown) But typical compilers will produce code consistently Expressions can be arbitrarily complex Most often, we have simple expressions like: x = 2*y or z = x + у two operands, one result (at most) three addresses (variables, pointers) in instruction manageable complexity for humans, algorithms closer to processor capabilities Can do source-to-source transformation of C to three-address code e g , with CiL analysis infrastructure (Necula et al , Berkeley) Would three-address code be enough? Processor instructions could have an opcode (for basic operations) + (at most) three addresses You might do this for a virtual machine in a compiler class project Why not in a real processor? Would three-address code be enough? Processor instructions could have an opcode (for basic operations) + (at most) three addresses You might do this for a virtual machine in a compiler class project Why not in a real processor? : memory access is slow code : three arbitrary addresses make a long instruction From early simple processors to CiSC to RiSC powerful instructions, complex addressing modes multi-step operations in same instruction uniform short instructions (one word) general-purpose registers (with same role) simple addressing, load store architecture (separate memory access and arithmetic) ultimately, seems microarchitecture has more performance impact EAX General-purpose Registers EBX ECX EDX EDi ESP (stack pointer) EBP (base pointer) 8 bits 16 bits 8 bits AX BX CX DX * 32 bits -► On 64-bit architectures: extended registers: rax, rbx, etc http:  www cs virginia edu  evans cs216 guides x86 html instructions have 1 to several bytes Opcode: says what instruction does 1-byte instructions: ret, push pop inc dec reg Operands: register (8, 16 or 32-bit) + 64 constant (8, 16 or 32-bit) + 64 (contents of) memory address For a memory access, must indicate amount of data read written 1, 2, 4, 8 bytes in C, given by the type of the pointer: *, intl6 t *, etc in assembly, must specify explicitly different instructions [ ], 7 [ ], 7 [ ], 7 Simplest memory transfer instructions push: first decrement, then put value on stack sp -= 4 [sp] = value pop: take value from stack pointer, then increment value = [sp] sp += 4 [var], , [ -4] [ + ], , [ +4* ] Examples: http:  www cs virginia edu  evans cs216 guides x86 html add, sub, mul, div imul, idiv (signed) inc, dec : and, or, xor, not xor bx, bx    short way to zero a register Shifting: by constant bitcount, or value or reg cl shl [mem], 3 shr dx, cl Aii of these affect CF (carry): set when result does not fit OF (overflow): set when result does not fit SF (sign): arithmetic   logic result is negative ZF (zerox): arithmetic   logic result is zero AC (auxiliary carry): from bit 3 to bit 4 in 8-bit operand PF (parity): of low-order byte: 1 if even number of 1 bits When calling a function, several options to consider: Where to pass arguments ? (stack or registers?) Argument passing order (from left or from right)? Who cleans up the stack? (caller or callee) Who saves registers? (caller or callee) 32-bit x86, many compilers, Unix-like systems args passed on stack, right to left (allows varargs) result returned in eax caller saves eax, ecx, edx, callee saves rest caller cleans up stack stack frame multiple of 16 bytes (since gcc 4 5) Compile to assembly (cc could be gcc, clang, etc ) cc -S -masm=intel file c Extra options: -02 to optimize -m32 compiles to 32 bits on 64-bit system Sta с к G rowth H ig h e г Ad d resses saved ESi saved EDi ESP local variable 3 local variable 2 local variable 1 [ebp]-4 return address EBP parameter 1 [ebp]+8 parameter 2 [ebp]+12 parameter 3 [ebp]+16 http:  www cs virginia edu  evans cs216 guides x86 html optionally, high-level function enter exit , 0 like like stdcall: Microsoft args also right to left, cleans up stack cannot have variable-length arguments syscall like cdecl but does not save AX, CX, DX 64-bit arch has 8 more registers => can use to pass values System V AMD64 ABi first 6 args passed in rdi, rsi, rdx, rcx, r8, r9 return value in rax and rdx How would you implement ( *fmt, ); knowing that arguments are passed from right to left ? important in reverse engineering May not know all entry points May not be able to follow all function calls e g indirect calls, through pointers in a table Standard prologues epilogues help disassembler detect functions jmp address caii address Should address be absolute or relative to the progam counter? jmp address caii address Should address be absolute or relative to the progam counter? : important to have (can load at any address in memory) Absolute jump   caii instructions also exist cmp opl, op2 like (signed) subtraction, but does not change left operand test opl, op2 like bitwise AND, but does not change left operand both set flags =>use for Based on a variety of flags (set by cmp   test) JA (above) JB (below) JC (carry) JE (equal) JG (greater) JL (less) JO (overflow) JS (sign) JZ (zero) CF = 0 and ZF = 0 CF = 1 CF = 1 (same as JB) ZF = 1 ZF = 0 and SF = OF SF 1= OF OF = 1 SF = 1 ZF = 1 also negations (JNA, JNB, etc ) + nonstrict cmp (JLE, JGE, some mnemonics mean same thing: JNGE = JL conditional jumps have near versions with 8-bit offset from compilation of statement { ADD, SUB, MUL, DiV, MOD, AND, OR, XOR } op t; ( > ) { (op) { ADD: a + b; SUB: a - b; MUL: a * b; DiV: a   b; MOD: a 7 b; AND: a & b; OR: a | b; XOR: a " b; 0; (* )( , ); ( ) { a + b; } ( ) { a - b; } ( ) { a * b; } ( ) { a   b; } intfn t fntab[] = { add, sub, mul, idiv } ( , *argv []) { (argc != 2) 1; = atoi(argv ); (op nxt); ((*adr = malloc( ( il)))) { (*adr)->el = val; (*adr)->nxt = NULL; ist; in picture, top row denotes 0xda050 0xda058 [ 0xda030 | 3 |- of individual fields 0xda030 0xda038 >| OxdaOlO | 4 |- OxdaOlO 0xda018 >| NULL | 7 | ist 0x4dea8 |OxdaQsQ| 0x4dea8 0xda050 0xda030 OxdaOlO adr (iter l) adr (iter 2) adr (iter 3) adr (iter 4) Formal Verification Temporal Logic Model Checking Basics A: Systems whose behavior is described precisely =s One of the simplest models: States and transitions (informally: "circles and arrows") Another view: system : set of all quantities that determine the behavior of the system in time Representation: every state has unique binary encoding (state variable) 13 October 2008 Definition of state: depends on level Systems using finite-state machines Formal of sequencing properties: temporal logic : verification by traversing the state graph Example for a processor: instruction set level; internai organization (inel, pipeline); register transfer level; gate-level; transistor level - , or Systems - (^> must be discrete) or (continuous Systems; programs with recursion or dynamic data structures) Formal Verification Lecture 2 Marius Minea Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics Formal Verification Temporal Logic Model Checking Basics 4 Finite state machines (automata): defined by and ex program state = variables + prog counter; transitions = statements (finite state if finite types, no recursion, no dynamic data) Our model: a set V= {y-у V2,      , vn} of variables over a domain D -a : an assignment s : V' of values for each variable in D - A state (assignment) o a true only for that assignment (ui : representable by logic formulas, e g , ui 3 -A s s' has two States => a formula over V' и V'' where V'' = сору of V' (next state variables) e g , (yemaphore = red) A (semaphore' = green) - : set of all transitions = a formula 77(V, V') Formal Verification Lecture 2 Marius Minea Kripke structure = finite-state automaton with labeled States M = (S, So, R, L) (compare with automata: labels (input symbols) on transitions) S: finite set of States Sq c S: set of initial States RC S x S: transition relation is total if every state has at least one transition Vs e s 3s' e s (s, s') e R L : S P(AP): state P: powerset (set of subsets) where AP = set of (observable boolean features that appear in formulas, properties, specifications) Examples: a state is stable (or not) define the proposition: bad ::= number of errors > 0 (trajectory) from a state sp: sequence of States: ir = sosis2 , such that R(si, s; | i) for all i > 0 Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics Formal Verification Temporal Logic Model Checking Basics Transitions are given as a , not a =t> there can be several States s' such that s -x s', i e , (s, s') e R in this case the model (Kripke structure) is called (the future behavior in a state is not uniquely determined) This is different from the DFA   NFA distinction: finite state automata have with =t> deterministic if unique next state for given state input Symbol (even if different inputs can lead to different States) For Systems viewed as open (interacting with an environment), this is called input nondetermlnlsm Typically, we view Kripke models as closed; we will discuss possible parallel composition with an environment input-output (functional) behavior is not enough for many Systems: interact with environment: reactlon to a stlmulus =t> Often have infinite execution (operating Systems, schedulers, servers) =t> A computatlon is an sequence of States Desired properties: A given (error) state is not reached ( ) The system does not deadlock ( ), etc More general properties can be described in = a modal logic, i e , truth is qualified (possibly, always, etc ) in this case: with temporal modalities: before, after, in the future, - used already by ancient philosophers for reasoning about time - formalized and applied by Pnueli (1977) to concurrent programs Formal Verification Lecture 2 Marius Minea Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics Formal Verification Temporal Logic Model Checking Basics Defined by Amir Pnueli in 1977 (ACM Turing Award 1996) Describes event sequencing along an execution path => structure - an event happens in the future - a property is invariant (holds everywhere) starting at a given state - an event follows another event (truth modalities along an execution trace) X( ): in the next state also written O F( ): sometime in the future O G( ): in every future state (including now) Q unary operators, refer to one property U( ) blnary operator, property-p untll property^ Sometimes also: release operator R (dual to untll) ignored here Express that a property is true paths => using the A => LTL formulas are of of the form А , where f is a of path formulas: f ::= p base case: p e AP is an atomic proposition ^fi i f 1 v A i f 1 A A usual boolean connectors ТЛ i F 1 | Gft i ftU 2 temporal operators Since the A quantifier is mandatory, and appears only once, it is sometimes left implicit (some authors write path formulas only) Formal Verification Lecture 2 Marius Minea Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics Formal Verification Temporal Logic Model Checking Basics LTL formulas of the form А  have their meaning defined in a state => called : true if all paths from s satisfy f Path formulas have their meaning (truth value) defined over a path Notations: M,s |= f in the model (Kripke structure) M, state s satisfies f M,тг |= f in model M, path тг satisfies f if M is fixed (given), we simply write s |= f, tt  = f tt‘ = suffix of path тг = sosis2       starting at s; : sjsj | 1sj | 2       Semantics of state formulas: s |= p o p e T(s) (state s has p as a labei) s |= А  о тг |= f for all paths тг from s For path formulas, define semantics as usual by structural inductlon: the semantics of a formula is given in terms of its simpler subformulas Formal Verification Lecture 2 Marius Minea Semantics of path formulas: tt  = p o s |= p pt AP holds in path origin rr F V тг y= f TT F f 1 v fi T- 7Г |= A V 7Г  2 TT |= f 1 A fi T- 7Г |= Л Л тг  2 TT |= X-f T- ТГ1 |= f f holds on the path suffix starting from state 1 7Г F F f O 3fc > 0 TTk  = f there exists a suffix on which f holds (  holds in a state) тг |= G f 33 Vfc > 0 ттк |= f f holds on all path suffixes (  holds in all States) тг  = A U A 33 3fc > 0 ттк  = А Л V) not expressive enough (e g , always posslble to reach a state) => another model: (branching view) finite unfolding of a state-transition graph starting from an initial state Additional path quantifier: E there exists (a path) 3 Two classes of formulas: , evaluated in a state f ::= p base case: p e AP atomic proposition Ti i A v А i Л л A fi fi state formulas Ej|Aj g path formula , evaluated over a path g ::= f base case: f is state formula ^51 i Si V 52 i Si A 52 51 i F 5i | G 5i | 5i U 52 (same rules as LTL, only base case more complex expressive) Semantics: same rules as LTL, plus: s E 5 o there exists a path тг from s with tt  = g Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics Formal Verification Temporal Logic Model Checking Basics defined by Clarke & Emerson (1981) =t> Turing Award 2007 with J Sifakis for model checking Tradeoff: expressiveness of specifications vs efficiency of checking =t> CTL is subset of CTL*, efficient to check, enough in many cases CTL is a branching-time logic, like CTL* CTL quantlfles over paths starting from a state =t> operators X , F , G , U are immediately preceded by A sau E =t> syntax of path formulas simplified, directly using state formulas: g ::=Xf Ff | Gf U  2 i flRh Expressiveness: LTL and CTL incomparable (neither includes the other); both less expressive than CTL* f Ag = У -^g) F f = trueU f Gf F) =t> Operators v, X , U and E suffice to express any formula has 2x4 = 8 pairs of guantifier x temporal operator: ax   = EX ; EF  = E [trueU ] AF'EG; AG EF ' A[ US] ЕЧЕСтМЧЕНиНЛт;)] =t> aii of them expressible using EX , EU and EG Formal Verification Lecture 2 Marius Minea Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 15 Formal Verification Temporal Logic Model Checking Basics EF finish it is possible to get to a state in which flnlsh = true AG (send -r AF ack) Any send is eventually followed by an ack AFAG stable On any path, stable is invariant (always holds) after some point AG (reg -r A [reg U grant]) A req stays active until a grant is issued AG AF ready On any path ready holds infinitely often AG E F restart From any state, it is possible to reach a state labeled restart Formal Verification Lecture 2 Marius Minea Given a Kripke structure M = (S, Sq, R, T) and a temporal logic formula f, find which States from S satisfy  : {stS|s^ f} Def: A formula (spec ) f holds in M iff aii initial States satisfy  : def M 1= f = Vs0 t Sq   *0 1= f - independently due to Clarke & Emerson; Queille & Sifakis (1981) - initially: 104-iO5 States Now: to iO100 States (symbolic checking) By structural decomposition of formula  : compute truth of all sub-formulas of f for each s e S - initially, set Z(s) = T(s) (atomic propositions true in state s) - trivial for logical connectors ^,v, л - EX : Just labei each state that has a successor labeled with   - to discuss: two algoritms for basic operators EU and EG Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 17 Formal Verification Temporal Logic Model Checking Basics idea: backwards traversai from States labeled  2 as long as Д holds procedure CheckEU(fi,  2) T := {,s i  2 E Ks)} ii fa holds in s forall s E T do Z(s) := Z(s) и {E then E[fi U  2] holds, labei s whileT 7^ 0 do still have candidates for search Choose s E T  T ' = T   {s}', never consider s twice forall si 7?(si,s) do for all predecessors of s if E Ksl) A  1 E Ksl) then si not labeled but Д holds Ksl) := Ksl) U {E }: -E[ i U fa] also holds, labei it T ' = T U {si}; si is candidate for continuing search Termlnates slnce S finite and no labeled state reenters T Consider only States satisfying   Traverse backwards starting from strongly connected components (on cycles where   perpetually holds) procedure CheckEG(f) restrict to States where f holds S':={s| e (s)}; SCC := {C  C is nontrivial SCC Of S'}', at least one edge T ' = UcEscc{s i s E C}', all States in SCCs are on cycles forall s E T do Z(s) ' = l(s) U {EG  }; thus get labeled whileT 7^ 0 do still have candidates for backwards search choose s g T; T ' = T   {s}; continue from s only once forall si si G need to extend CTL (semantics) with intuitively: decision fairness = if a decision (several transitions from a state) is repeated infinitely often, each branch is eventually taken Reformulate: each destination state of the decision is eventually reached Formally: A fairness constraint is a in temporal logic A path is iff the constraint is infinitely often true along the path ii LTL, we would write: F G assumption => conclusion in particular: fairness constraint expressed as =t> a passes infinitely often through the set Augment the Kripke structure M = (S,Sg,R,L,F), with F fair-CTL model checking reduces to CTL for AP и {fair} Formal Verification Lecture 2 Marius Minea Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 21 Formal Verification Temporal Logic Model Checking Basics - CTL model checking: O(|f    (|S| + F|)) (linear in size of model and formula) - CTL with fairness: O(| | • (|S| + |F|) • |F|) -LTL: PSPACE-complete  M -2°W> (different type of algorithm, based on a tableau construction) - CTL*: like LTL  M -2°W> CTL: usually preferred, because of polynomial (linear!) algorithm Spin uses LTL: exponential only in size of formula (usually small) Behavior of composed Systems emerges from component behavior For concurrently executing components: parallel composition: : conjunction (simultaneous transitions) д(Ѵ; v") = Д1(Ѵ1, V*) л Д2(Ѵ2,' 2) = iq U V2 : disjunction (individual transitions) д(Ѵ; v") = Д1(Ѵ1, V*) л Eq(V   i)) V Д2(Ѵ2, Vff) Л Eq(V   V2) Eq(F) = = v1} - arbitrary interleaving between transitions of components - a transition modifies Just the variables of one component - simultaneous transitions are deemed impossible Formal Verification Lecture 2 Marius Minea Formal Verification Lecture 2 Marius Minea Model Checking Basics Model Checking Basics October 13, 2005 • Finite state systems • Temporal logics: CTL* *, CTL, LTL • Explicit-state model checking Formal verification 2 Marius Minea Model Checking Basics - systems whose behavior can be described mathematically - we analyze: the interaction of the system with its environment - system = all quantities that determine its future behavior in time - the definition of state depends on the level in Example for a processor: instruction set level; internai organization (inel, pipeline, etc ); register transfer level; gate-level; transistor level System classification: - , or systems - (necessarily discrete) or (continuous systems, recursive programs, programs with dynamic data structures) Formal verification 2 Model Checking Basics Marius Minea - Finite state machines (automata): States + transitions - Programs (finite): variables + program counter There is no conceptual difference ! Let V = {v±,v2,'•• ,vn} be a set of variables A : an assignment s : V D of values from a given domain D for each variable vtV - A state (assignment) a formula true only for that assignment: (^1 і 3 => of States can be represented by logic formulas -A s —> sz: a formula over V и V V = сору of V (next state formulas) ex (semaphore = red) A (jsernaphore' = green) - set of all transitions: transition relation = a formula R(V, V'} Formal verification 2 Marius Minea Model Checking Basics Kripke structure = labeled finite-state automaton M = (S,Sq,R,L") - S  finite state set - Sq c S: set of initial States -RCSxS: total Vs e S 3s' e s (s,У) e R (from every state there is at least one transition) - L : S — 2-4p: state function AP =set of (observations that appear in formu- las  properties specifleations) Examples: - a state is stable or not - define the proposition bad ::= red-recvd > 1 (Spin project) Path (trajectory): set of States starting from sq: % = sqsis2 , with -R(si,sh-1) f°r all i 0 Formal verification 2 Marius Minea • sequential circuits: a variable for each state element (register) and for primary inputs instantaneous combinational propagation assumed • asynchronous circuits: one variable for each signal (in more complex accurate models: explicit physical time) • programs: declared variables + program counter (for procedures, need to keep track of local variables on stack during time of procedure activation; potentially infinite-state) Formal verification 2 Marius Minea Types of composition (deriving system behavior from behavior of components) • : conjunction (simultaneous transitions) Я(Ѵ, V') = ?1(Ѵ1, V ) A ?2(V2,V^) V = V1UV2 • : disjunction (individual transitions) R(V,V') = ^(Vi, V ) A Hq(V  Vi) ѴЯ2(Ѵ2,Ѵ^) A Eq(V V2) where Eq((7) = = "’') - arbitrary interleaving between component transitions - a transition changes just the variables of one component - simultaneous transitions considered impossible Programs are usually modeled asynchronously (there is no physical synchronization between instructions of concurrent programs) Formal verification 2 Marius Minea Model Checking Basics Model Checking Basics Model Checking Basics - interact with the environment (reaction to a given stimulus) - often have infinite execution => a computation = infinite set of States => it is not enough to represent input-output behavior - Examples: a given (error) state is not reached the system does not deadlock More generally: properties described in - logic (truth with temporal modalities) - used starting in anntiquity for reasoning about time - formalized and applied by Pnueli (1977) to concurrent programs Formal verification 2 Marius Minea - defined by Pnueli in 1977 (Turing Award 1996) - describes events along an execution trace => structure e g an event happens in the future; a property is invariant starting from a given timepoint; an event follows another event (truth modalities along an execution trace): • X: in the o • F: sometime in the (inel, now) <> • G : (in every future state, starting now) □ • U: ; propi must hold until prop2 appears sometimes we also define • R ( ): appearance of prop^ releases the need for prop2 Formal verification 2 Marius Minea - we wish a property to hold for trajectories => we use the A - formulas are of the form Af, where f is a - Syntax of path formulas f ::= p (for p e AP) fi fi v f2 i fi л f2 i F r | GA | AU 2 | ARA Formal verification 2 Marius Minea Model Checking Basics Model Checking Basics Model Checking Basics Denote M,s |= f : in the model M, state s satisfies f тѵг = suffix of the path % = sqsis2 • • • starting at Some properties cannot be expressed in the linear time model: e g it is possible to reach a state => alternative model: M,s  = p p E L(s) M,s |= A f V path % from s, M,tv  = f M,tv =p M,s |= p, for p e AP and s the first state of % M,7V |= M,7T f M, 7Г |=  1 V f2 " M, 7Г |=  1 V M, rr |= f2 M, rr |= A A f2 " M, 7Г |=  1 Л M, rr |= A M, 7Г |= X f " M, 7Г1 |= f M, 7Г |= F f " ЭА- > 0 M, tF |= f M, 7Г |= G f " Vfc > 0 M, tF |= f Af rr^AUA " ЭА>0 ЛР, 7Tfc|= А Л V) 0 (Ѵ7 91 i 91 V 92 i 91 л 92 i X 91 i F 91 i G 91 i 91 U 92 i 91 R 92 Semantics: similar to LTL, plus: M,s |= E Operators ->, v, X, U and Esuffice to express any CTL* * formula [Clarke, Emerson 1981] - sufficient in many cases, but simpler => more efficient algorithms - structure, like CTL* - quantifies over all possible execution paths from a state - operators X , F , G , U , R must be immediately preceded by A or E - syntax of path formulas: S::=X  | F  | G   | ftU 2 |  iR 2 10 combinations, all expressible using EX, EG si EU : АХ ЕХ  EF = E[trueUfl AF,' EG AG EF  A[ Uj]e^EG^A EHUHa^)] E[ Rj]e^AHU^1 A[ Rj]^EbfU^] Formal verification 2 Marius Minea Formal verification 2 Marius Minea Formal verification 2 Marius Minea Model Checking Basics Model Checking Basics Model Checking Basics • FFfinish it is possible to reach a state in which finish = true • AG (send AF ack) Any send is eventually followed by an ack • AF AG stable in any execution, from a given moment on, stable holds overall • AG (req A [reg U grant]) A req stays always active until receiving a • AG AF ready On any path, ready holds an infinite number of times • AG EF restart From any state it is possible to get to the restart state CTL and LTL are incomparable: - A F G p is in LTL, has no CTL equivalent - AG EF p is in CTL, has no LTL equivalent - their disjunction is in CTL*, but not in CTL, nor LTL Some techniques (compositionality, abstraction) need restrictions: typically, only the universal quantifier A is allowed - ACTL (included in CTL, incomparable to LTL) - ACTL* (included in CTL*, more expressive than LTL) in practice: reasonable assumptions of the sort: - an arbiter does not continuously ignore a particular request - a continuously retransmitted message reaches destination = properties which can be expressed in CTL* but not CTL => define a new semantics for CTL with fairness A fairness constraint is a formula in temporal logic A path is is each constraint is true infinitely often along the path in particular: constraint expressed as set of States: a fair path passes through that state infinitely often Formal verification 2 Marius Minea Formal verification 2 Marius Minea Formal verification 2 Marius Minea Model Checking Basics Model Checking Basics 20 Model Checking Basics Augment Kripke structure, M = (S,S0,R,L,F), by F C 25 (F = set of state sets, {P±,      , Pn}, Pi c 5) def inf (тг) = {s | s = si for infinitely many i} (set of States apearing infinitely often on тг) тг is fair e VP e F inf (тг) n F 0 (тг passes infinitely often through any set in F) Denote |=p the satifaction relationship with fairness Modified clauses in CTL semantics: M,s there is a fair path from s and p e L(s) M,s =pEg 3 fair path тг from s cu М,тг  =p g M,s  =F&g V fair paths тг from s, М,тг |=p g Formal verification 2 Marius Minea Given a Kripke structure M = (S,Sq,R,L) and a formula f in temporal logic, find the set of States S that satisfy f: {* e S | M,s |= f} The specification is satisfied if all initial States satisfy f:   SoeSo M,so =f - independently, Clarke & Emerson, resp Queille & Sifakis (1981) - iniyially: iO4—iO5 States, currently, symbolic techniques: ca iO100 States - Decompose according to the structure of formula f For any s e S, compute Z(s) = set of subformulas of f true in s - initially Z(s) = L(s) Trivial for logic connectors ->,ѵ,л - EX f: labei any state with a successoor labeled by cu f - Other basic operators: EU and EG Formal verification 2 Marius Minea E [fi Uf?]: backwards traversai from  2 as long as fi holds procedure CheckEU(J ,  2) T:={s f2e l(s)} forall s e T doZ(s) := Z(s) U{E[fi U f2]}; while T 0 do choose s e T; foralls! r?(slts) do if E [A U f2] Z(S1) л A e Z(S1) then  01) МиМЕйиД}; T:=TU{S1}; Formal verification 2 Marius Minea Model Checking Basics 22 Model Checking Basics 23 Model Checking Basics EG f: consider only States that satisfy f Traverse backwards starting from strongly connected components (SCC) procedure CheckEG(f) S':={s  fel(s)}; SCC : = {(7 | C is a nontrivial SCC in S'}; T ' = 'Jcescc{s i " e C}; forall s e T doZ(s) := Z(s) U {EG f}; while T 0 do choose s e T  T:=T {s}; forall si si e S aF(si,s) do if EG f ^ (sO then Z(si) :=i(rt)u{EG }; T := TU {S1}; Formal verification 2 Marius Minea Consider the fairness constraint F = {Fi,   • •, F^}, with Pi C S Let fair be a new atomic proposition, true in s iff there is a fair path starting from s Thus faire L(s) M, s  =p EG true For the other operators, the problem is reduced to ordinary model checking M, s  =f p M, s |= p л fa ir M, s |=F EX f M, s |= EX (f Л fair) M, s  =F E [fi U f2] & M, s |= E [Я U (f2 Л fair)] For M,s |=F EG f we modify the previous algorithm, considering only SCCs with ffi С r  Pi =P 9 (that contain at least a state from each component of the fairness constraint) Formal verification 2 - model checking CTL: 0(1 1 • (|5'| + |F|)) (linear in size of model and formula) - CTL with fairness F: O(|f|   (|S| + |F|) • |F|) - LTL: PSPACE-complet ЛТ|   different type of algorithm, based on a tableau (automaton) construc-tion - CTL*: like LTL ЛТ   2° { } { } ( sqrt(a*a ( ) ( 0; b*b - 2*a*b*cos(phi)); , thirdside(3, 5, atan(l))); is NOT from input a value is NOT it A function will typically NOT ask for input The smallest functions will and This allows them to be composed and used anywhere A function will typically NOT print its result, just return it (printing is inflexible: may want different format, language, etc ) We might write "wrapper" functions that ask for input, then caii the computation function We might also write display functions that get a value and print it (Computational) problems are solved by writing : usually given as arguments: f (3, 7), read from input Functions result produced with the statement expression ; appear at end of any path ( branch) through function else the function won’t return a result! control reaches end of non-void function in statement f (5); returned value is it: f (5); , as parameter printf ( , f(5)) , etc Functions that : return type ( ) { printf( , n); } returns on reaching closing brace OR ; (NO expression) : standalone in an expression statement: print int(7); any solvable complex problem can be solved using recursion => recursion is Таке some expression using integer arithmetic: (2 + 3) * (4 + 2 * 3) — 5 * 6 (7 - 2) + (4 + 3 - 2) (7 - 3) Can we compute it? YES, once we realize the is the of two (2 + 3) * (4 + 2 * 3) - 5 * 6 (7 - 2) (4 + 3 - 2) (7 - 3) We then compute the simpler expressions decomposing similarly: (2 + 3) *(4 + 2*3) 5 *6 (7-2) = 44 (4 + 3-2) (7 - 3) = 1 44 1 = 45 What was essential to compute the expression ? is sum of two simpler Expressing the we can add, divide, etc two Deciding if expression is a number, need to do nothing From mathematics, we know recurrence relations for | x0 = b (i e : xn = b for n = 0) arithmetic sequence: 0 Example: 1,4, 7,10,13, ( ? = 1, r = 3) i x0 = b (i e : xn = b for n = 0) geometric sequence: 0 Example: 3, 6,12, 24,48, ( ? = 3, r = 2) xn is not computed directly, but A notion is if it is , using x" i : write recurrences for: C", Fibonacci sequence, Recursion is fundamental in computer Science: it reduces a problem to a simpler case of the problem : a is {a single element O sequence an element followed by a O O O O e g word (sequence of letters); number (sequence of digits) : a is {a step —> path a followed by a step '——>' —> e g traversing a path in a graph An 'number (7) identifier (x) 0) ( , ) { n==0 ? 1 : x * pwr(x, n-1); ( ) ( , pwr(-2 0, 3)); 0; : type of nonnegative integers (natural numbers) The of pwr is a of the function so it can be used in its own function body ( ) Even if we write pwr (-2, 3), -2 (int) will be to float (the type declared for each parameter is known) executed with The pwr function does two computations: -a (n == 0 ? ?) if so, return 1 - else, a multiply; the right operand requires a pwr(5, 3) Ш25 5* pwr(5, 2) W5 5* pwr (5, 1) И5 5 * pwr(5, 0) 1 in the recursive computation of the power function: Every caii makes , until the base case it reached Every caii executes , but with (own values for parameters) When reaching the base case, all started calls are still (each has to perform the multiplication with the result of the caii) Returning is done of the calls (caii with exponent 0 returns, then the one with exponent 1, etc ) Recursion = reduction to a case of the problem is simple enough for direct computation (can   need no longer be reduced) ( 1 Xn = J (x2)" 2 [ x-(x2)" 2 n = 0 n > 0 even n > 0 odd ( , ) n == 0 ? 1 : n 7 2 == 0 ? pow2(x*x, n 2) : x * pow2(x*x, n 2) ; } What happens for n = 1 ? needless computation of (x2)0 (which is 1) rewrite: {1 n = 0 x n = 1 (x2)n 2 n > 1 even x   (x2)" 2 n > 1 odd ( , ) n ( , ) printf( , x, n); n piog2(n + 1)1 calis pow2(5, 6)—>pow2(25, 3) —> pow2(625, 1) Recursion solves a problem by reducing it to a simpler case of the same problem To use recursion, we must express the problem as a things given known to the function are (index of recursive sequence; problem size; etc ) the answer to the problem is the function Sometimes, the problem asks to (print) rather than compute a result A function body may have several statements t printf( ); statement printf( ); printf( 0; } , cos(0)); } statement Function returns on reaching closing brace OR statement More generally, a (compound statement) can appear in place of any statement This is an example of in the statement ::= expressionoptionai expressionoptjonai ; (inel, function caii) { statement statement} ? : selects from two to selects between two to expression or expression statementl statementl statement2 if the expression is true (nonzero) statementl is executed, else statement2 is executed (or nothing, if the latter is missing) Each branch has only statement if several statements are needed, these must be grouped in a { } An belongs to the closest i ( expi ) 2 ( exp2 ) stmt-then 2 stmt-else The around the condition are mandatory Printing roots of a quadratic equation: ( = b * b (delta >= 0) { printf( printf( } printf( } Can rewrite the ( ) } x > 0 ? x : -x; ) - 4 *a * c; , (-b-sqrt(delta)) 2 a); , (-b+sqrt(delta)) 2 a); ); using the ( ) (x > 0) x -x; } Fibonacci sequence: Fq = 0, Fi = 1, Fn = Fn i + Fn 2 for n > 1 inefficient to do direct recursion (exercise: how many calls?) Can define Fibonacci words (strings): So = 0, Si = 01, Sn = Sn-iSn-2 (formed by string ) Write a function that prints Sn problem = function; effect = print; concatenation = sequencing Fractals are figures (a part of the figure looks like the whole figure = recursion!) What is the base case? What defines a part of the figure? http:  mathworld wolfram com BoxFractal html 1 : no recursive caii = simplest case, defined directly e g in sequences: initial term xq of the recurrence the empty list (for a list of elements) A missing base case is an recursion never stops! 2 defines a notion using a simpler case of the same notion 3 (argument) that recursion stops in finite number of steps (e g a nonnegative measure that decreases on each application for sequences: the index (smaller in definition body but > 0) for recursive objects: size (component objects are smaller) ? Xn+1 = 2 • xn 7 xn = Xn+i 3 ? an = a • a • • a (n times) ? a sentence is a sequence of words ? a sequence is the concatenation of two smaller sequences ? a string is a character followed by a string A recursive definition must be well formed (conditions 1-3) something cannot be defined only in terms of itself one can only use other notions which are already defined computation has to stop at some point A natural number (in base 10) can be defined viewed recursively: a number is a or: preceded by (in base 10) We can find the two parts using integer division (with remainder) n = 10 • (n 10) + n%10 1457 = 10-145 + 7 the last digit of n is n%10 1457%10 = 7 the number remaining in front is n 10 1457 10 = 145 Exercises with a simple recursive solution: sum of a number’s digits number of digits; largest smallest digit, etc Solution: always base case: for single-digit number recurrence: last digit with result for (n Ю) 1, if number b ? a : b; } ( ) n b ? а : b; } } n ( ) { n == 0 ? 1 : s(n-l) + cos(n); } ( ) { ( , s(lOOOOOO)); 0; }  a out Segmentation fault Code executes sequentially (except for branch call return) On function caii, must remember Must store after caii to keep using them These are placed on the Each function activation has its arguments, return address, local vars Nested calls return in opposite order made => stack frames popped in reverse order of saving (last in, first out) For deep recursion, stack may be insufficient => program crash locals of f(0) retaddr: to f(l) args to f: n=0 locals of f(l) retaddr: to f(2) args to f: n=l locals of f(2) retaddr: to main args to f: n=2 locals: main So = 1, Sn = Sn i + cos n We know we’ll have to add cos n (but not yet to what) => can anticipate and va lues we need to add When reaching the base case, add accumulator (partial result) ( , ) n == 0 ? acc : s2(acc + cos(n), n-1); } ( ) { Program now works! s2(l, n); } А function is if recursive caii is in the function no computation done after caii (e g , with result) result (if any) is returned unchanged between calls parameter and local values no longer needed : replace recursive caii with jump, return value at end (base case) (Optimizing) compiler converts tail recursion to iteration (loop) need not worry about efficiency 1Т 1Т : are we done? return (result) (not done): compute new partial result caii recursive function with new partial result (usually an extra parameter, besides initial input) Exercise: rewrite Fibonacci extra parameters: last, previous number stopping condition: all iterations done Often, problem restated with explicit (accumulator) 146 14 1 empty(O) 5 What is the result of reverting the end has already been reverted the resulting number is r empty(O) 564 and remaining part is n? ( ) { n == 0 ? r : rev2(n 10, 10*r + n % 10); } ( ) { rev2(n, 0); } Careful: in base case (else computation is thrown away!) Babylonian method: ao = 1, an+i = ^(an + ^-) sequence of approximations => recursive solution given (parameters): x and the current approximation result = a satisfactory approximation (precision e) Re-state problem: corn pute Jx an 1 Computation: if precision good |an+i — an  ( , ) { fabs(a n - x a n) 3 ( ) x = -3 ? (x — 3 we still need to ask x 3 ? 6 : 2*x) if x is not 3, it must be x G [—3,3] The conditional expression is an expression may be used anywhere an expression is needed Example: as an expression of type string : function that prints a string to stdout, followed by a newline ( ) puts(n == 0 ? : n > 0 ? : ); } Note layout for readability: one question per line Expression: arithmetic operations: x + 1 function caii: fact(5) Statement: return n + 1; Any n + 3; followed by becomes a (computes, but does not use the result) printf ("hello ! ") ; we do not use the result of printf but are interested in the , printi ng printf returns an int: number of chars written (rarely used) Statements contain expressions Expressions don’t contain statements Statements are written and executed in order (sequentially) With decision, recursion and sequencing we can write any program : severa 1 statements between { } A function body is a compound statement ( ) statement = acos(-l); printf( , pi); statement } = sqrt( 5) - sin(pi 4); printf( , diff); } A compound statement is considered a single statement May contain declarations: anywhere (C99 Cll) at start (C89) All other statements are terminated by a semicolon The is the comma: exprl expr2 evaluate exprl, ignore; evaluate expr2 => value of whole expression The branches of an can be any statements => also statements => can chain decisions one after another ( , , ) (op == ) printf( , a + b); (op == ) printf( , a - b); puts( ); Checks op== and op== are write (op == -) printf ( , a + b); (op == -) printf ( —a -Ht it is pointless do the second test if the first was true (op cannot be both + and - at the same time) The proper code is with chained s (or a statement) if each branch ends with returning a value, the is not needed: we only get to a branch if the previous condition was false (else the function will have returned): ( , , ) (op == ) a + b; (op == ) a - b; puts ( ); 0; Often, we first deal with error cases, then do the actual processing: ( ) { (n > 100) { puts( (n Lambda calculus is a We’ve seen: computation is done by functions in general, both function and arguments can be expressions e ::= x variable | Xx e function abstraction (definition) | ei e2 function application Basic ideas: functions are values (no split b w functions and args results) functions need not be named (A-abstractions suffice) functions are all one needs (can express numbers, if-then, etc ) Syntax conventions: the scope of the abstraction extends as far right as possible application is left-associative, ei ез means (ei ег) ез The function abstraction Xx e the occurrence of x in e intuitively: inside e, x is the argument; outside e it has no meaning Set of free variables of an expression: FV(x) = {x} FV(Ax e) = FV(e)  {x} FV(ei e2) = FV(ei) U FV^ A term is if it has no free variables A variable that is not free is called Calling a function means using the (actual) argument in place of the (formal) parameter in most languages, this means evaluating the argument expressions in lambda calculus, we will just do syntactic substitution То correctly compute with A expressions, we need to define substitutions Denote by ei[x —> e2] the substitution of x by in ei (various other notations: ei[x := е2] еі[х е2] ei[e2 x]) Define: r i — f e if У is the same as x y [ у if у is different from x (Ay ei)[x e2] = ( Ay ei if у is the same as x ( Ay (ei[x —> ег]) if у is different from x (otherwise occurrences of у in e2 would be by Ay ei) (ei ег)[х -> e] = (ei[x -> е])(ег[х e]) a-conversion (bound variables can be renamed) Ax e = Ay (e[x —> y] if у FV(e) Then we can substitute Ау еЦх —> ег] also when у G FV(e2): first rename у to some fresh variable z: Ay ei = Az ei[y —> z] then substitute x with ei: Az ei[y —> z][x —> ег]  5-conversion (or  5-reduction) (Ax ei) e2 = ei[x e2] step for lambda expressions We write: is the ry-conversion: Xx e x = (Ax ei) e2 —> з ei[x —> 62] simplifies application + abstraction e if x FV(e) Two terms are if one can be converted to each other by the three conversion rules A A-expressions may have several  Treducible subexpressions ( ) => which one to apply first ? : if a term reduces to two different terms, these in turn reduce to a common term (diamond property) e —61 A 6 —3 e - Oi —e A &2 —s e allows disambiguating expressions, without need for excess parantheses allows disambiguating expressions, without need for excess parantheses allows disambiguating expressions, without need for excess parantheses how to evaluate operators with same precedence left-associative, right-associative operators may be associative in math, but not in prog lang allows disambiguating expressions, without need for excess parantheses how to evaluate operators with same precedence left-associative, right-associative operators may be associative in math, but not in prog lang allows disambiguating expressions, without need for excess parantheses how to evaluate operators with same precedence left-associative, right-associative operators may be associative in math, but not in prog lang of operands for a given operator specified or unspecified leftmost outermost redex first also reduces under A leftmost outermost redex first does not reduce under A (caii by value) only reduce (Ax ei) 62 when argument 62 is value in programming language practice: evaluation: only reduce argument if needed, but do not duplicate expressions (evaluate at most once) Usually, recursion requires the recursive object But A-calculus does not let us introduce names Start from the diverging (infinite) self-application (Ax x x)(Ax x x) Define another closed term that applies a function to an argument Y = Af (Ax f(x x))(Ax f(x x)) Y is called , because Y f = f(Y f) (show!) 4 October 2017 Types of black-box testig Equivalence class partitioning Boundary testing Cause-effect analysis Exploratory testing Product is viewed as an opaque system (no access to internai details - this includes source applicable to any product no effort for source code analysis applicable from simple to complex and in a variety of situations Or: Function testing test each function in isolation; basic functionality tests are credible, easy to evaluate, not very powerful Domain testing essence: sample equivalence classes through representatives initially one variable at a time, then combinations well-chosen va lues powerful, informative tests Specification-based testing tests for every claim in the specificatin req list model manual conformance is very significant; choose representative tests can go deeper: find errors omissions ambiguities limit cases in spec Risk-based testing imagine a way program could fail, test for it tests must be powerful, credible, motivating Stress testing: several definitions 1) under burst of activity 2) at beyond specified limits, to cause failure (iEEE std ) 3) to see howthe program fails (important!) Regression testing test set designed for reuse after every program change no longer powerful, but well documented for maintenance User testing real, not simulated users (beta testing) using specified scenarios, or freely credible, motivating, not always powerful (depends on user Scenario Testing specific use case; may be model-based credible, motivating, easy to evaluate, complex going deeper: use scenario in limit   hostile case State-model-based testing model: finite-state automaton analyze model, then product with model-based tests High-volume automated testing Exploratory testing actively guides testing process designs new tests based on info offered by existing tests 1 Start with simple (obvious) tests (grave if they fail) 2 Test each function, understand behavior before criticizing 3 Test broadly before deeply Cover program before focusing 4 More powerful tests, boundary conditions 5 Expand scope, look for challenges 6 Freestyle exploratory testing Analyze domain of values for each variable or input, identify sets for which we assume tests behave alike => used to generate a set of "interesting" conditions for testing Desirable: a test case should cover several relevant conditions (should reduce number of conditions to analyze by more than one) For every condition: tests with valid and invalid values Myers suggests using a table of the form Condition Valid equiv classes invalid equiv classes Depending on the variable type   domain: For an one valid case (inside), two invalid ones (on both sides) will refine for boundary testing For a fixed (speficied) number: one valid case, two invalid cases (larger, smaller) For enumeration type: each value, plus an invalid one Combining equivalence classes into test cases: cover as many valid classes with one test case generate a separate test for each invalid class (if combined, an invalid condition may mask another) Declaring dimensions of an array in FORTRAN [Myers] DiMENSiON array-descrp ( , array-descrp )* array-descrp ::= name ( dim ( , dim )* ) name ::= letter ( letter | digit )* (1 6 chars) dim ::= [ lower-bound : ] upper-bound bound ::= int-constant | name -65534 2001 Ui attacks: refresh screen (done completely?) Try to overstep internai limits e g create table of maximum size, then add a row Computations with invalid operators   operands Test recursive inclusions (frame in frame; fooinote in fooinote, etc ) 5 October 2017 A is а statement of what is, and what is not, allowed A is a method, tool or procedure for a security policy Bishop, Computer Security: Art and Science we need to check if the mechanism is correct A mechanism may be: - safe (does not allow States disallowed by the policy) - precise (allows exact y what the policy specifies) - broad (allows more than the policy does) a mechanism to allow or deny an entity’s access to a resource "principal" subject —> request —> guard monitor —> object Access control consists of two steps: : Who made the access request ? : Does subject s have access rights for resource o 7 We distinguish: - a set of subjects or principals S - a set of objects 0 - a set of access modes A Simplest: A = {observe, alter} Usually not enough The Bell-LaPadula model refines this to: A = {execute, read, append, write} When are distinctions between these modes useful ? We distinguish: - a set of subjects or principals S - a set of objects 0 - a set of access modes A Simplest: A = {observe, alter} Usually not enough The Bell-LaPadula model refines this to: A = {execute, read, append, write} When are distinctions between these modes useful ? log: append, without changing prior contents execute encryption, without knowing the key The simplest and most general organization of access control Two dimensions: subiects and objects - every matrix entry S x O: set of rights permissions - a subject may also be an object (e g a process): has the right to read write(to) execute another process filei  etc passwd  bin rlogin Alice r, w r X Bob r - r, X A representation of the access control matrix where each object has associated a list of subjects and their permissions simple case: Unix permissions Richer set of permissions: Andrew File System (distributed) read list (for directories: content) insert (new file in directory) delete (file from directory) write lock (may use lock in directory) administer (grant right): the right to give others permission (admin): the right to give oneself permissions A subject may not give rights it does not possess to another Q: in Unix, the owner of a file may grant others (group others) read rights on the file, even if (s)he does not have these rights is the above principie violated ? is it possible to design a correct access control system ? Def: A system is safe with respect to a given right (of a subject over an object), if there is no sequence of transitions (operations) by which the right could be added, assuming it does not exist at first is it possible to design a correct access control system ? Def: A system is safe with respect to a given right (of a subject over an object), if there is no sequence of transitions (operations) by which the right could be added, assuming it does not exist at first Theorem: The safety of an arbitrary system in a state, relative to a given access right is Proof: a Turing machine can be reduced to (encoded into) such a system essentially because of the ability to create objects There are simpler subclasses of Systems that are decidable - if only there are no create primitives - if the Systems are monotonie (if create, no destroy) and only single conditions are allowed allows (typically: owner) to set mechanisms by which access is granted forbidden access is controlled by the system, cannot be changed by the user usually: based on a set of rules ru le-based access control Q: What are advantages and disadvantages of each category? system-determined policy, depending on the active role of a subiect 3 (or more) levels: subject —> —> object Permissions are defined depending on the role a subject may have access when acting in one role, not in another role hierarchy (some may be included in others) attending physician C physician C medical personnel can model various requirements, e g separation of duty ex a bank loan must be approved by two different bank officers Security policies must be carefully specified and => policy description languages Norman Hardy, The Confused Deputy(or why capabilities might have been invented), ACM SiGOPS Operating Systems Review, 22(4), 1988 Never Separate An Object From its Authority Who is to blame? - The code to deposit the debugging output in the file named by the user? - Must the compiler check to see if the output file name is in another directory? - Should the compiler check for directory name SYSX? - Should the compiler check for the name (SYSX)BiLL? Term with slightly different meanings: in , an that denotes an object the rights associated with it ex file descriptor handle (on open, the access mode is also set) in , a capability is a of a subject (corresponds to a row in the access control matrix) E g : POSiX Linux Capabilities capability h for a new process: new = forced | (allowed & inheritable) Examples: CAP CH0WN, CAP KiLL, CAP SETUiD, CAP SETPCAP inspired from military domain Defines security levels e g public One of the simplest models: States and transitions (informally: "circles and arrows") Another view: system : set of all quantities that determine the behavior of the system in time Representation: every state has unique binary encoding (state variable) Definition of state: depends on level Example for a processor: instruction set level; internai organization (inel, pipeline); register transfer level; gate-level; transistor level - , or systems - (=> must be discrete) or (continuous systems; programs with recursion or dynamic data structures) Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 3 Finite state machines (automata): defined by and ex program state = variables + prog counter; transitions = statements (finite state if finite types, no recursion, no dynamic data) Our model: a set V = {vi,v2, • • •, w} of variables over a domain D -a : an assignment s : V D of values for each variable in D - A state (assignment) a true only for that assignment 7, i?2 4, T3 2) (i?1 = 7) Л (i?2 = 4) Л (тз = 2) - A formula the set of all assignments that make it true => : representable by logic formulas, e g , 3 -A s sf has two States => a formula over V и Vr where Vf = сору of V (next state variables) e g , (sema phore = red) A (sema phoref = green) - : set of all transitions = a formula 7^(V, Vх) Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 4 Kripke structure = finite-state automaton with labeled States M = (S, So, R, L) (compare with automata: labels (input symbols) on transitions) S: finite set of States Sq c S: set of initial States transition relation is total if every state has at least one transition Vs e s 3s' e s (s, s') e R L : S —> V(APy state P: powerset (set of subsets) where AP = set of (observable boolean features that appear in formulas, properties, specifications) Examples: a state is stable (or not) define the proposition: bad ::= number of errors > 0 (trajectory) from a state sq: sequence of States: тг = sqsis2      , such that P(sj, for all i > 0 Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 5 Transitions are given as a , not a => there can be several States s' such that s s', i e , (","') e R in this case the model (Kripke structure) is called (the future behavior in a state is not uniquely determined) This is different from the DFA   NFA distinction: finite state automata have with => deterministic if unique next state for given state input Symbol (even if different inputs can lead to different States) For systems viewed as open (interacting with an environment), this is called input nondeterminism Typically, we view Kripke models as closed; we will discuss possible parallel composition with an environment Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 6 input-output (functional) behavior is not enough for many systems: interact with environment: reaction to a stimulus => Often have infinite execution (operating systems, schedulers, servers) => A computation is an sequence of States Desired properties: A given (error) state is not reached ( The system does not deadlock ( ), etc More general properties can be described in = a modal logic, i e , truth is qualified (possibly, always, etc ) in this case: with temporal modalities: before, after, in the future, - used already by ancient philosophers for reasoning about time - formalized and applied by Pnueli (1977) to concurrent programs Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 7 Defined by Amir Pnueli in 1977 (ACM Turing Award 1996) Describes event sequencing along an execution path => structure - an event happens in the future - a property is invariant (holds everywhere) starting at a given state - an event follows another event (truth modalities along an execution trace) X( ): in the next state also written O F ( ): sometime in the future V G ( ): in every future state (including now) 3 unary operators, refer to one property U( ) binary operator, property^ until property2 Sometimes also: release operator R (dual to until) ignored here Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 8 Express that a property is true => using the A paths => LTL formulas are of of the form Af, where f is a of path formulas: f ::= p base case: p e AP is an atomic proposition i ,fi i fi v fi i fi л  2 usual boolean connectors i X i i F i i G i i  1 U 2 temporal operators Since the A quantifier is mandatory, and appears only once, it is some-times left implicit (some authors write path formulas only) Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 9 LTL formulas of the form Af have their meaning defined in a state => called : true if all paths from s satisfy f Path formulas have their meaning (truth value) defined over a path Notations: M, s |= f in the model (Kripke structure) M, state s satisfies f M, тг |= f in model M, path т satisfies f if M is fixed (given), we simply write s |= f, т |= f 7гг = suffix of path тг = sosis2 • • • starting at si : • • • Semantics of state formulas: s |= p s |= Af p e L(s') (state s has p as a labei) |= f for all paths т from s For path formulas, define semantics as usual by structural induction: the semantics of a formula is given in terms of its simpler subformulas Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 10 Semantics of path formulas: л-1= p p e AP holds in path origin 7Г |= -•  О 7Г   ТГ |= Л V  2 7Г |= f 1 V 7Г |=  2 7Г 1= f 1 A  2 О 7Г |= f 1 Л 7Г |=  2 к 1= X f О 7Г1 |= f f holds on the path suffix starting from state 1 тг |= F f о Эк > 0 7г^ |= f there exists a suffix on which f holds (  holds in a state) тг |= G f o Vk > 0 7T^ |= f f holds on all path suffixes (  holds in all States) тг |= fi U  2 > 0 тѵк |= f2 A Vj not expressive enough (e g , always possible to reach a state) => another model: (branching view) finite unfolding of a state-transition graph starting from an initial state Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 12 Additional path quantifier: E there exists (a path) 3 Two classes of formulas: , evaluated in a state f ::= p base case: pe AP atomic proposition i " А i  1 V  2 i  1 Л  2  1-  2 state formulas i E g i Ag g path formula , evaluated over a path g ::= f base case: f is state formula i "'31 i 91 V g2 i 31 A 32 i X 31 i F 31 i G 31 i 31 U 32 (same rules as LTL, only base case more complex expressive) Semantics: same rules as LTL, plus: s |= E g o there exists a path тг from s with тг |= g Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 13 defined by Clarke & Emerson (1981) => Turing Award 2007 with J Sifakis for model checking Tradeoff: expressiveness of specifications vs efficiency of checking => CTL is subset of CTL*, efficient to check, enough in many cases CTL is a branching-time logic, like CTL* CTL quantifies over paths starting from a state => operators X , F , G , U are immediately preceded by A sau E => syntax of path formulas simplified, directly using state formulas: ^::=Xf | Ff | Gf | f±Uf2 | fiRf2 Expressiveness: LTL and CTL incomparable (neither includes the other); both less expressive than CTL* Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 14 f   g = V ->g) F f = trueUf Gf = 4= ,f A  = ->E->  => Operators v, X, U and Esuffice to express any formula has 2x4 = 8 pairs of quantifier x temporal operator: AXJ-FXJ EFf = F[trueUf] A [f U g  = ->EG ->g Л ->E [- ,V,A - EX : just labei each state that has a successor labeled with   - to discuss: two algoritms for basic operators EU and EG Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 17 idea: backwards traversai from States labeled  2 as l°ng as Д holds procedure CheckEU(fi,  2) T := {s i  2  = it fa holds in s forall s 6 T do Z(s) := Z(s) U {E [Д U  2]}', then E[fi U  2] holds, labei s whileT Ф 0 do still have candidates for search chooses e T; T := T   { need to extend CTL (semantics) with intuitively: decision fairness = if a decision (several transitions from a state) is repeated infinitely often, each branch is eventually taken Reformulate: each destination state of the decision is eventually reached Formally: A fairness constraint is a in temporal logic A path is iff the constraint is infinitely often true along the path ii LTL, we would write: F G assumption => conclusion in particular: fairness constraint expressed as => a passes infinitely often through the set Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 20 Augment the Kripke structure M = (S, Sq, R, L,F), with F C P(S) (F = set of subsets of States, {Fi,      , Pn},c S) def inf(7r) = | for infinitely many i} (set of States appearing infinitely often on t) t is a fair path o VF e F inf(jr) n P Ф 0 (t passes infinitely often through each set from F) For |=F, ("holds fairly") replace "path" with "fair path" in semantics For model checking, define new atomic proposition fair: faire L(s) о M, s  =F EG true => fair-CTL model checking reduces to CTL for AP U {fair} Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 21 - CTL model checking: O(|f| • (|S| + |E|)) (linear in size of model and formula) - CTL with fairness: O(|f | • (|S| + |E|) • |F|) - LTL: PSPACE-complete  M  • (different type of algorithm, based on a tableau constructiorT) - CTL*: like LTL  M  • CTL: usually preferred, because of polynomial (linear!) algorithm Spin uses LTL: exponential only in size of formula (usually small) Formal Verification Lecture 2 Marius Minea Formal Verification Temporal Logic Model Checking Basics 22 Behavior of composed systems emerges from component behavior For concurrently executing components: parallel composition: conjunction (simultaneous transitions) R(V, V) = Яі(Ѵі, Л Я2(Ѵ2, V = Vi u v2 : disjunction (individual transitions) R(V, V) = Я1(Ѵ1, V() A Eq(V   Vx) V Я2(Ѵ2, A Eq(V   V2) Eq(U) = KvEU{v = t ) - arbitrary interleaving between transitions of components - a transition modifies just the variables of one component - simultaneous transitions are deemed impossible Formal Verification Lecture 2 Marius Minea 5 October 2017 A is а statement of what is, and what is not, allowed A is a method, tool or procedure for a security policy Bishop, Computer Security: Art and Science we need to check if the mechanism is correct A mechanism may be: - safe (does not allow States disallowed by the policy) - precise (allows exact y what the policy specifies) - broad (allows more than the policy does) a mechanism to allow or deny an entity’s access to a resource "principal" subject —> request —> guard monitor —> object Access control consists of two steps: : Who made the access request ? : Does subject s have access rights for resource o 7 We distinguish: - a set of subjects or principals S - a set of objects 0 - a set of access modes A Simplest: A = {observe, alter} Usually not enough The Bell-LaPadula model refines this to: A = {execute, read, append, write} When are distinctions between these modes useful ? We distinguish: - a set of subjects or principals S - a set of objects 0 - a set of access modes A Simplest: A = {observe, alter} Usually not enough The Bell-LaPadula model refines this to: A = {execute, read, append, write} When are distinctions between these modes useful ? log: append, without changing prior contents execute encryption, without knowing the key The simplest and most general organization of access control Two dimensions: subiects and objects - every matrix entry S x O: set of rights permissions - a subject may also be an object (e g a process): has the right to read write(to) execute another process filei  etc passwd  bin rlogin Alice r, w r X Bob r - r, X A representation of the access control matrix where each object has associated a list of subjects and their permissions simple case: Unix permissions Richer set of permissions: Andrew File System (distributed) read list (for directories: content) insert (new file in directory) delete (file from directory) write lock (may use lock in directory) administer (grant right): the right to give others permission (admin): the right to give oneself permissions A subject may not give rights it does not possess to another Q: in Unix, the owner of a file may grant others (group others) read rights on the file, even if (s)he does not have these rights is the above principie violated ? is it possible to design a correct access control system ? Def: A system is safe with respect to a given right (of a subject over an object), if there is no sequence of transitions (operations) by which the right could be added, assuming it does not exist at first is it possible to design a correct access control system ? Def: A system is safe with respect to a given right (of a subject over an object), if there is no sequence of transitions (operations) by which the right could be added, assuming it does not exist at first Theorem: The safety of an arbitrary system in a state, relative to a given access right is Proof: a Turing machine can be reduced to (encoded into) such a system essentially because of the ability to create objects There are simpler subclasses of systems that are decidable - if only there are no create primitives - if the systems are monotonie (if create, no destroy) and only single conditions are allowed allows (typically: owner) to set mechanisms by which access is granted forbidden access is controlled by the system, cannot be changed by the user usually: based on a set of rules rule-based access control Q: What are advantages and disadvantages of each category? system-determined policy, depending on the active role of a subiect 3 (or more) levels: subject —> —> object Permissions are defined depending on the role a subject may have access when acting in one role, not in another role hierarchy (some may be included in others) attending physician C physician C medical personnel can model various requirements, e g separation of duty ex a bank loan must be approved by two different bank officers Security policies must be carefully specified and => policy description languages Norman Hardy, The Confused Deputy(or why capabilities might have been invented), ACM SiGOPS Operating Systems Review, 22(4), 1988 Never Separate An Object From its Authority Who is to blame? - The code to deposit the debugging output in the file named by the user? - Must the compiler check to see if the output file name is in another directory? - Should the compiler check for directory name SYSX? - Should the compiler check for the name (SYSX)BiLL? Term with slightly different meanings: in , an that denotes an object the rights associated with it ex file descriptor handle (on open, the access mode is also set) in , a capability is a of a subject (corresponds to a row in the access control matrix) E g : POSiX Linux Capabilities capability h for a new process: new = forced | (allowed & inheritable) Examples: CAP CH0WN, CAP KiLL, CAP SETUiD, CAP SETPCAP inspired from military domain Defines security levels e g public can use to pass values System V AMD64 ABi first 6 args passed in rdi, rsi, rdx, rcx, r8, r9 return value in rax and rdx important in reverse engineering May not know all entry points May not be able to follow all function calls e g indirect calls, through pointers in a table Standard prologues epilogues help disassembler detect functions jmp address caii address Should address be absolute or relative to the progam counter? jmp address caii address Should address be absolute or relative to the progam counter? : important to have (can load at any address in memory) Absolute jump   caii instructions also exist cmp opl, op2 like (signed) subtraction, but does not change left operand test opl, op2 like bitwise AND, but does not change left operand both set flags =>use for Based on a variety of flags (set by cmp   test) JA (above) JB (below) JC (carry) JE (equal) JG (greater) JL (less) JO (overflow) JS (sign) JZ (zero) CF = 0 and ZF = 0 CF = 1 CF = 1 (same as JB) ZF = 1 ZF = 0 and SF = OF SF != OF OF = 1 SF = 1 ZF = 1 also negations (JNA, JNB, etc ) + nonstrict cmp (JLE, JGE, some mnemonics mean same thing: JNGE = JL conditional jumps have near versions with 8-bit offset from compilation of statement { ADD, SUB, MUL, DiV, MOD, AND, OR, XOR } op t; ( > ) { (op) { ADD: a + b; SUB: a - b; MUL: a * b; DiV: a   b; MOD: a 7 b; AND: a & b; OR: a | b; XOR: a " b; 0; (* )( , ); ( ) { a + b; } ( ) { a - b; } ( ) { a * b; } ( ) { a   b; } intfn t fntab[] = { add, sub, mul, idiv } ( , *argv []) { (argc != 2) 1; = atoi(argv ); (op nxt); ((*adr = malloc( (intlist t)))) { (*adr)->el = val; (*adr)->nxt = NULL; ist; Marius Minea marius@cs upt ro 3 October 2016 b*b - 4*a*c; (e g prints a message) ( ) printf( , code); (computes + writes: several statements) ( ) printf( , x); x * x; { } { } ( sqrt(a*a ( ) ( 0; b*b - 2*a*b*cos(phi)); , thirdside(3, 5, atan(l))); is NOT from input a value is NOT it A function will typically NOT ask for input The smallest functions will and This allows them to be composed and used anywhere A function will typically NOT print its result, just return it (printing is inflexible: may want different format, language, etc ) We might write wrapper functions that ask for input, then caii the computation function We might also write display functions that get a value and print it We solve a (computational) problem by writing a function : the , used to compute result read from input, but given in function caii: f (3, 7) Functions result produced with the statement expression ; appear at end of any path ( branch) through function else the function won’t return a result! control reaches end of non-void function in statement f (5); returned value is it: f (5); , as parameter printf ( , f(5)) , etc Functions that (e g , just print) declare function with return type ( ) { printf( , n); } returns on reaching closing brace OR ; (NO expression) use: standalone in an expression statement: print int (7); Recursion = reduction to a case of the problem is simple enough for direct computation (can   need no longer be reduced) {1 n = 0 x n = 1 (x2)" 2 n > 1 even x   (x2)" 2 n > 1 odd ( , ) n ( , ) printf( , x, n); n piog2(n + 1)1 calls pow2(5, 6)—>pow2(25, 3) —> pow2(625, 1) Recursion solves a problem by reducing it to a simpler case of the same problem To use recursion, we must express the problem as a things given known to the function are (index of recursive sequence; problem size; etc ) the answer to the problem is the function Sometimes, the problem asks to (print) rather than compute a result A function body may have several statements t printf( ); statement printf( ); printf( 0; } , cos(0)); } statement Function returns on reaching closing brace OR statement More generally, a (compound statement) can appear in place of any statement This is an example of in the statement ::= expressionoptionai expressionoptjonai ; (inel, function caii) { statement statement} ? : selects from two to selects between two to if expression statementl else statement2 or if expression statementl if the expression is true (nonzero) statementl is executed, else statement2 is executed (or nothing, if the latter is missing) Each branch has only statement if several statements are needed, these must be grouped in a { } The around the condition are mandatory Printing roots of a quadratic equation: ( = b * b (delta >= 0) { printf( printf( } printf( } Can rewrite the ( ) } x > 0 ? x : -x; ) - 4 *a * c; , (-b-sqrt(delta)) 2 a); , (-b+sqrt(delta)) 2 a); ); using the ( ) (x > 0) x -x; } Fibonacci sequence: Fq = 0, Fi = 1, Fn = Fn i + Fn 2 for n > 1 inefficient to do direct recursion (exercise: how many calls?) Can define Fibonacci words (strings): So = 0, Si = 01, Sn = Sn iSn 2 (formed by string ) Write a function that prints Sn problem = function; effect = print; concatenation = sequencing Fractals are figures (a part of the figure looks like the whole figure = recursion Box fractal: Fractals are figures (a part of the figure looks like the whole figure = recursion!) What is the base case? What defines a part of the figure? http:  mathworld wolfram com BoxFractal html 1 : no recursive caii = simplest case, defined directly e g in sequences: initial term xq of the recurrence the empty list (for a list of elements) A missing base case is an => recursion never stops! 2 the defines a notion using a simpler case of the same notion 3 Proof argument that recursion stops in a finite number of steps (e g a nonnegative measure that decreases on each application for sequences: the index (smaller in definition body but > 0) for recursive objects: size (component objects are smaller) ? Xn+1 = 2 • xn 7 xn = Xn+i 3 ? an = a • a • • a (n times) ? a sentence is a sequence of words ? a sequence is the concatenation of two smaller sequences ? a string is a character followed by a string A recursive definition must be well formed (conditions 1-3) something cannot be defined only in terms of itself one can only use other notions which are already defined computation has to stop at some point A natural number (in base 10) can be defined viewed recursively: a number is a single digit or: last digit preceded by (in base 10) We can find the two parts using integer division (with remainder) n = 10 • (n 10) + n%10 1457 = 10-145 + 7 the last digit of n is n%10 1457%10 = 7 the number remaining in front is n 10 1457 10 = 145 Problems with a simple recursive solution: sum of a number’s digits number of digits; largest smallest digit, etc Solution: always base case: for single-digit number recurrence: last digit with result for (n Ю) 1, if number b ? a : b; } ( ) n b ? а : b; } } n ( ) { n == 0 ? 1 : s(n-l) + cos(n); } ( ) { ( , s(lOOOOOO)); 0; }  a out Segmentation fault Code executes sequentially (except for branch call return) When calling a function, must remember (right after caii) Must remember to keep using them These are placed on the since nested calls return in opposite order made must restore values in reverse order of saving (last in, first out) if recursion is very deep, stack may be insufficient program crash even otherwise, save call restore may be expensive So = 1, Sn = Sn i + cos n We know we’ll have to add cos n (but not yet to what) => can anticipate and values we need to add When reaching the base case, add accumulator (partial result) ( , ) n == 0 ? acc : s2(acc + cos(n), n-1); } ( ) { Program now works! s2(l, n); } А function is if recursive caii is in the function no computation done after caii (e g , with result) result (if any) is returned unchanged between calis parameter and local values no longer needed : replace recursive caii with jump, return value at end (base case) (Optimizing) compiler converts tail recursion to iteration (loop) need not worry about efficiency 1Т 1Т : are we done? return (result) (not done): compute new partial result caii recursive function with new partial result (usually an extra parameter, besides initial input) Exercise: rewrite Fibonacci extra parameters: last, previous number stopping condition: all iterations done ASCii = American Standard Code for information interchange Characters are represented as a numeric code = index in this table e g ’0’ == 48, ’A’ == 65, ’a’ == 97, etc 0123456789ABCDEF 0x0  0  a  b  t  n  v  f  r 0x10: 0x20: Prefix denotes (in base 16) Characters can be stored in a byte (CHAR BiT > 8 bits) char can be , at least -128 to 127, or , at least 0 to 255 Both are included in int are written betweeen (single) ’ ’ They are in expressions: Digits, lowercase letters and uppercase letters are => ’7’ == ’0’ + 7 ’5’ - ’0’ == 5 ’E’ - ’A’ == 4 ’f’ == ’a’ + 5 Escape sequences (textual representation) for special chars:  0’ nuli ’ n’ newline  a’ alarm ’ r’ carriage return  b’ backspace ’ f ’ form feed  t ’ tab single quote  v ’ vertical tab ’W backslash , in stdio h : int putchar(int c); (sample use): putchar(’7’) an unsigned char (given as int); returns its value, or (constant -1) on error ( ) ( ); putchar( ); putchar(getchar()); 0; } (stored in one byte) ’A’ is just another way of writing 65 condition ? exprl : expr2 everything is an expression exprl or expr2 may be conditional expression themselves (if we need more questions to find out the answer) -6 2x 6 x 3 ( ) x = -3 ? (x — 3 we still need to ask x 3 ? 6 : 2*x) if x is not 3, it must be x G [—3,3] The conditional expression is an expression may be used anywhere an expression is needed Example: as an expression of type string in puts (function that prints a string to stdout, followed by a newline) ( ) puts(n == 0 ? : n > 0 ? : ); } Note layout for readability: one question per line Expression: arithmetic operations: x + 1 function caii: fact(5) Statement: return n + 1; Any n + 3; followed by becomes a (computes, but does not use the result) printf ("hello ! ") ; we do not use the result of printf but are interested in the , printi ng printf returns an int: number of chars written (rarely used) Statements contain expressions Expressions don’t contain statements Statements are written and executed in order (sequentially) With decision, recursion and sequencing we can write any program : severa i statements between { } A function body is a compound statement ( ) statement = getchar(); printf( ) statement putchar(c); } } A compound statement is considered a single statement May contain declarations: anywhere (C99 Cll) at start (C89) All other statements are terminated by a semicolon The is the comma: exprl expr2 Evaluate exprl, ignore, the value of the expression is that of expr2 The branches of an can be any statements => also statements => can chain decisions one after another ( , , ) (op == ) printf( , a + b); (op == ) printf( , a - b); puts( ); Checks op== and op== are write (op == -) printf ( , a + b); (op == -) printf ( —a -Ht it is pointless do the second test if the first was true (op cannot be both + and - at the same time) The proper code is with chained s (or a statement) if each branch ends with returning a value, the is not needed: we only get to a branch if the previous condition was false (else the function will have returned): ( , , ) (op == ) a + b; (op == ) a - b; puts ( ); 0; Often, we first deal with error cases, then do the actual processing: ( ) { (n > 100) { puts( (n ( (n >= 10) prininat(n 10); putchar( + n 7 10); ( ) (312); Marius Minea 5 October 2015 Course references: Principles of Programming Languages, Uday Reddy, Univ of Birmingham Program Analysis and Understanding, Jeff Poster, Univ of Maryland Lambda calculus: developed in 1930’s by Alonzo Church initially typed, then untyped fragment Formalizing Lambda calculus Turing machines [Church] general recursive functions [Church, Kleene, Rosser] These three computational processes are equivalent, i e , the class of computable functions (by recursion or A-calculus) are precisely the effectively calculable ones (by a Turing machine) Church-Turing thesis: these models express what is effectively computable => Lambda calculus is a We’ve seen: computation is done by functions in general, both function and arguments can be expressions e ::= x variable | Xx e function abstraction (definition) | ei e2 function application Basic ideas: functions are values (no split b w functions and args results) functions need not be named (A-abstractions suffice) functions are all one needs (can express numbers, if-then, etc ) Syntax conventions: the scope of the abstraction extends as far right as possible application is left-associative, ei ез means (ei ег) ез The function abstraction Xx e the occurrence of x in e intuitively: inside e, x is the argument; outside e it has no meaning Set of free variables of an expression: FV(x) = {x} FV(Ax e) = FV(e)  {x} FV(ei e2) = FV(ei) U FV^ A term is if it has no free variables A variable that is not free is called Calling a function means using the (actual) argument in place of the (formal) parameter in most languages, this means evaluating the argument expressions in lambda calculus, we will just do syntactic substitution То correctly compute with A expressions, we need to define substitutions Denote by ei[x —> e2] the substitution of x by in ei (various other notations: ei[x := е2] еі[х е2] ei[e2 x]) Define: г i — f e if У is the same as x y [ у if у is different from x (Ay ei)[x e2] = ( Ay ei if у is the same as x ( Ay (ei[x —> ег]) if у is different from x (otherwise occurrences of у in e2 would be by Ay ei) (ei ег)[х -> e] = (ei[x -> е])(ег[х -> e]) a-conversion (bound variables can be renamed) Ax e = Ay (e[x —> y] if у FV(e) Then we can substitute Ау еЦх —> ег] also when у G FV(e2): first rename у to some fresh variable z: Ay ei = Az ei[y —> z] then substitute x with ei: Az ei[y —> z][x —> ег]  5-conversion (or  5-reduction) (Ax ei) e2 = ei[x e2] step for lambda expressions We write: is the ry-conversion: Xx e x = (Ax ei) e2 —> з ei[x —> 62] simplifies application + abstraction e if x FV(e) Two terms are if one can be converted to each other by the three conversion rules A A-expressions may have several  Treducible subexpressions ( ) => which one to apply first ? : if a term reduces to two different terms, these in turn reduce to a common term (diamond property) e —61 A 6 —3 e - Oi —e A &2 —s e allows disambiguating expressions, without need for excess paranthesees allows disambiguating expressions, without need for excess paranthesees allows disambiguating expressions, without need for excess paranthesees how to evaluate operators with same precedence left-associative, right-associative operators may be associative in math, but not in prog lang allows disambiguating expressions, without need for excess paranthesees how to evaluate operators with same precedence left-associative, right-associative operators may be associative in math, but not in prog lang allows disambiguating expressions, without need for excess paranthesees how to evaluate operators with same precedence left-associative, right-associative operators may be associative in math, but not in prog lang of operands for a given operator specified or unspecified leftmost outermost redex first also reduces under A leftmost outermost redex first does not reduce under A (caii by value) only reduce (Ax ei) 62 when argument 62 is value in programming language practice: evaluation: only reduce argument if needed, but do not duplicate expressions (evaluate at most once) 5 October 2016 Types of black-box testig Equivalence class partitioning Boundary testing Cause-effect analysis Exploratory testing Product is viewed as an opaque system (no access to internai details - this includes source applicable to any product no effort for source code analysis applicable from simple to complex and in a variety of situations Or: Function testing test each function in isolation; basic functionality tests are credible, easy to evaluate, not very powerful Domain testing essence: sample equivalence classes through representatives initially one variable at a time, then combinations well-chosen values powerful, informative tests Specification-based testing tests for every claim in the specificatin req list model manual conformance is very significant; choose representative tests can go deeper: find errors omissions ambiguities limit cases in spec Risk-based testing imagine a way program could fail, test for it tests must be powerful, credible, motivating Stress testing: several definitions 1) under burst of activity 2) at beyond specified limits, to cause failure (iEEE std ) 3) to see howthe program fails (important!) Regression testing test set designed for reuse after every program change no longer powerful, but well documented for maintenance User testing real, not simulated users (beta testing) using specified scenarios, or freely credible, motivating, not always powerful (depends on user Scenario Testing specific use case; may be model-based credible, motivating, easy to evaluate, complex going deeper: use scenario in limit   hostile case State-model-based testing model: finite-state automaton analyze model, then product with model-based tests High-volume automated testing Exploratory testing actively guides testing process designs new tests based on info offered by existing tests 1 Start with simple (obvious) tests (grave if they fail) 2 Test each function, understand behavior before criticizing 3 Test broadly before deeply Cover program before focusing 4 More powerful tests, boundary conditions 5 Expand scope, look for challenges 6 Freestyle exploratory testing Analyze domain of values for each variable or input, identify sets for which we assume tests behave alike => used to generate a set of "interesting" conditions for testing Desirable: a test case should cover several relevant conditions (should reduce number of conditions to analyze by more than one) For every condition: tests with valid and invalid values Myers suggests using a table of the form Condition Valid equiv classes invalid equiv classes Depending on the variable type   domain: For an one valid case (inside), two invalid ones (on both sides) will refine for boundary testing For a fixed (speficied) number: one valid case, two invalid cases (larger, smaller) For enumeration type: each value, plus an invalid one Combining equivalence classes into test cases: cover as many valid classes with one test case generate a separate test for each invalid class (if combined, an invalid condition may mask another) Declaring dimensions of an array in FORTRAN [Myers] DiMENSiON array-descrp ( , array-descrp )* array-descrp ::= name ( dim ( , dim )* ) name ::= letter ( letter | digit )* (1 6 chars) dim ::= [ lower-bound : ] upper-bound bound ::= int-constant | name -65534 2001 Ui attacks: refresh screen (done completely?) Try to overstep internai limits e g create table of maximum size, then add a row Computations with invalid operators   operands Test recursive inclusions (frame in frame; fooinote in fooinote, etc ) 6 October 2016 Recap: access control = a mechanism to allow or deny an entity’s access to a resource We distinguish: - a set of subjects or principals S - a set of objects 0 - a set of access modes A Simplest: A = {observe, alter} Usually not enough The Bell-LaPadula model refines this to: A = {execute, read, append, write} The simplest and most general organization of access control Two dimensions: subiects and objects - every matrix entry S x O: set of rights permissions - a subject may also be an object (e g a process): has the right to read write(to) execute another process filei  etc passwd  bin rlogin Alice r, w r X Bob r - r, X A representation of the access control matrix where each object has associated a list of subjects and their permissions simple case: Unix permissions Richer set of permissions: Andrew File System (distributed) read list (for directories: content) insert (new file in directory) delete (file from directory) write lock (may use lock in directory) administer (grant right): the right to give others permission (admin): the right to give oneself permissions A subject may not give rights it does not possess to another Q: in Unix, the owner of a file may grant others (group others) read rights on the file, even if (s)he does not have these rights is the above principie violated ? is it possible to design a correct access control system ? Def: A system is safe with respect to a given right (of a subject over an object), if there is no sequence of transitions (operations) by which the right could be added, assuming it does not exist at first is it possible to design a correct access control system ? Def: A system is safe with respect to a given right (of a subject over an object), if there is no sequence of transitions (operations) by which the right could be added, assuming it does not exist at first Theorem: The safety of an arbitrary system in a state, relative to a given access right is Proof: a Turing machine can be reduced to (encoded into) such a system essentially because of the ability to create objects There are simpler subclasses of systems that are decidable - if only there are no create primitives - if the systems are monotonie (if create, no destroy) and only single conditions are allowed allows (typically: owner) to set mechanisms by which access is granted forbidden access is controlled by the system, cannot be changed by the user usually: based on a set of rules ru le-based access control Q: What are advantages and disadvantages of each category? system-determined policy, depending on the active role of a subiect 3 (or more) levels: subject —> —> object Permissions are defined depending on the role a subject may have access when acting in one role, not in another role hierarchy (some may be included in others) attending physician C physician C medical personnel can model various requirements, e g separation of duty ex a bank loan must be approved by two different bank officers Security policies must be carefully specified and => policy description languages Norman Hardy, The Confused Deputy(or why capabilities might have been invented), ACM SiGOPS Operating Systems Review, 22(4), 1988 Never Separate An Object From its Authority Who is to blame? - The code to deposit the debugging output in the file named by the user? - Must the compiler check to see if the output file name is in another directory? - Should the compiler check for directory name SYSX? - Should the compiler check for the name (SYSX)BiLL? Term with slightly different meanings: in , an that denotes an object the rights associated with it ex file descriptor handle (on open, the access mode is also set) in , a capability is a of a subject (corresponds to a row in the access control matrix) E g : POSiX Linux Capabilities capability h for a new process: new = forced | (allowed & inheritable) Examples: CAP CH0WN, CAP KiLL, CAP SETUiD, CAP SETPCAP inspired from military domain Defines security levels e g public 3 => of States can be represented by logic formulas -A s sf  a formula over V и Vr V’ = сору of V (next state formulas) ex (sema phore = red) A (sema phoref = green) - set of all transitions: transition relation = a formula 7^(V, V') Formal verification 2 Marius Minea Model Checking Basics 4 Kripke structure = labeled finite-state automaton M = (S, So, R, L) - S: finite state set - Sq c S: set of initial States - RCS x S: total V s e S 3s' e S (s, s') e R (from every state there is at least one transition) - L : S —> 2ap: state function AP =set of (observations that appear in formu- las properties specifications) Examples: - a state is stable or not - define the proposition bad ::= red recvd > 1 (Spin project) Path (trajectory): set of States starting from sq: л- = sosis2      > with R(si, for all i > 0 Formal verification 2 Marius Minea Model Checking Basics 5 • sequential circuits: a variable for each state element (register) and for primary inputs instantaneous combinational propagation assumed • asynchronous circuits: one variable for each signal (in more complex accurate models: explicit physical time) • programs: declared variables + program counter (for procedures, need to keep track of local variables on stack during time of procedure activation; potentially infinite-state) Formal verification 2 Marius Minea Model Checking Basics 6 Types of composition (deriving system behavior from behavior of components) : conjunction (simultaneous transitions) В(Ѵ, V') = vp Л E2(V2, V^) V = Vi U V2 : disjunction (individual transitions) R(V, V') = Яі(Ѵь V{) Л Eq(V   Vi) V R2(V2, Л Eq(V   V2) where Eq(U) = = v') - arbitrary interleaving between component transitions - a transition changes just the variables of one component - simultaneous transitions considered impossible Programs are usually modeled asynchronously (there is no physical synchronization between instructions of concurrent programs) Formal verification 2 Marius Minea Model Checking Basics 7 - interact with the environment (reaction to a given stimulus) - often have infinite execution => a computation = infinite set of States => it is not enough to represent input-output behavior - Examples: a given (error) state is not reached the system does not deadlock More generally: properties described in - logic (truth with temporal modalities) - used starting in anntiquity for reasoning about time - formalized and applied by Pnueli (1977) to concurrent programs Formal verification 2 Marius Minea Model Checking Basics 8 - defined by Pnueli in 1977 (Turing Award 1996) - describes events along an execution trace => structure e g an event happens in the future; a property is invariant starting from a given timepoint; an event follows another event (truth modalities along an execution trace): • X : in the • F : sometime in the (inel, now) • G : (in every future state, starting now) • U : ; propi must hold until prop2 appears sometimes we also define ): appearance of prop± releases the need for prop2 Formal verification 2 Marius Minea Model Checking Basics 9 - we wish a property to hold for trajectories => we use the A - formulas are of the form Af, where f is a - Syntax of path formulas f ::= p (for p g AP) Formal verification 2 Marius Minea Model Checking Basics Denote M, s |= f: in the model M, тгг = suffix of the path тг = sqsis? M, s |= p M,s =Af p € L(s) V path тг fron M, s |= p, for M, 7Г f 3k > 0 M,  k Vk>O M,irk Эк > 0 M,irk Vk>0 (yj alternative model: : infinite unfolding of state-transition system starting from initial state Formal verification 2 Marius Minea Model Checking Basics 12 in addition to LTL operators: existential quantifier E (there exists a path) a Two types of formulas: - , evaluated in a state f ::= p (unde p e AP) i "’ i i Л V  2 i А л  2 | E g i Ag (where g= path formula) - , evaluated along a path g ::= f (where f = state formula) i "'31 i 91 V g2 i 31 A 32 i x 31 i F 31 i G 31 i 31 U 32 i 31 R 32 Semantics: similar to LTL, plus: M, s|=Ej >3a path тг from s such that M, тг |= g Formal verification 2 Marius Minea Model Checking Basics 13 • f A g = -(- Operators v, X, U and Esuffice to express any CTL* formula Formal verification 2 Marius Minea Model Checking Basics 14 [Clarke, Emerson 1981] - sufficient in many cases, but simpler => more efficient algorithms - structure, like CTL* - quantifies over all possible execution paths from a state - operators X , F , G , U , R must be immediately preceded by Aor E - syntax of path formulas: g "=Xf | Ff | Gf | AU 2 i AR 2 Formal verification 2 Marius Minea Model Checking Basics 15 10 combinations, all expressible using EX, EGsi EU: • AX =EX  • EF   = E [trueU ] • AF =EG  • AG  —EF —   • A [ U g] =->EG-it; Л->E [->   Л-ig)] • E[ Rff] • A[ Rg]E^E[n U^] Formal verification 2 Marius Minea Model Checking Basics 16 • EF finish it is possible to reach a state in which finish = true • AG (send AF ack) Any send is eventually followed by an ack • AF AG stable in any execution, from a given moment on, stable holds overall • AG (req A [reg U grant]) A reg stays always active until receiving a • AG AF ready On any path, ready holds an infinite number of times • AG EF restart From any state it is possible to get to the restart state Formal verification 2 Marius Minea Model Checking Basics 17 CTL and LTL are incomparable: - A F G p is in LTL, has no CTL equivalent - AG EF p is in CTL, has no LTL equivalent - their disjunction is in CTL*, but not in CTL, nor LTL Some techniques (compositionality, abstraction) need restrictions: typically, only the universal quantifier A is allowed - ACTL (included in CTL, incomparable to LTL) - ACTL* (included in CTL*, more expressive than LTL) Formal verification 2 Marius Minea Model Checking Basics 18 in practice: reasonable assumptions of the sort: - an arbiter does not continuously ignore a particular request - a continuously retransmitted message reaches destination = properties which can be expressed in CTL* but not CTL => define a new semantics for CTL with fairness A fairness constraint is a formula in temporal logic A path is is each constraint is true infinitely often along the path in particular: constraint expressed as set of States: a fair path passes through that state infinitely often Formal verification 2 Marius Minea Model Checking Basics 19 Augment Kripke structure, M = (S,Sq,R,L,F')i by F C2S (F = set of state sets, {Pi, •   •, Pn}, Pi C S) dcf inf(rr) = {s | s = Si for infinitely many i} (set of States apearing infinitely often on тг) тг is fair O VP G F inf (тг) П P 7^ 0 (тг passes infinitely often through any set in F) Denote 1=77 the satifaction relationship with fairness Modified clauses in CTL semantics: M, s  =fp o there is a fair path from s and p g L(s) M,s 1=2? Eg 3 fair path тг from s cu M, тг |=р g M,s  =р Ад V fair paths тг from s, M n  =p g Formal verification 2 Marius Minea Model Checking Basics 20 Given a Kripke structure M = (S, Sq, R, L) and a formula f in temporal logic, find the set of States S that satisfy f: {s e s | m, s |= f} The specification is satisfied if all initial States satisfy f: Vsq e Sq M, sq |= f - independently, Clarke & Emerson, resp Queille & Sifakis (1981) - iniyially: iO4 — iO5 States, currently, symbolic techniques: ca iO100 States - Decompose according to the structure of formula f For any s e S, compute Z(s) = set of subformulas of f true in s - initially Z(s) = L(s) Trivial for logic connectors v,A - EX : labei any state with a successoor labeled by cu   - Other basic operators: EU and EG Formal verification 2 Marius Minea Model Checking Basics E [ i U f2]: backwards traversai frc procedure CheckEU(fi, f2) T:={s f2& Z(s)} forall s eT do Z(s) := Z(s) U {E [, while T 7^ 0 do choose s € T  T- = T {s}-forall si E(sb s) do ifE[ iU 2] ^z(si)a i ezi Z(si) :=Z(si)U{E[ iU  T :=TU{si}; Formal verification 2 21 >m  2> as long as  1 holds fi U 2]}; 'si) then 2]}; Marius Minea Model Checking Basics 22 EGf: consider only States that satisfy f Traverse backwards starting from strongly connected components (SCC) procedure CheckEG(f) S'   = {s  f el(s)}; SCC := {C | C is a nontrivial SCC in S'}; t ' = ^c^scci81s e c}; forall s e T do Z(s) := Z(s) и {EG f}; whileT 7^0 do chooses € T; T:=T {s}', forall si si e S' л R(si, s) do if EG f l(si) then Z(si) := Z(si) U {EG f}; T :=TU{si}; Formal verification 2 Marius Minea Model Checking Basics 23 Consider the fairness constraint F = {РІ5 • • •, Pk}, with Д- C S Let fair be a new atomic proposition, true in s iff there is a fair path starting from s Thus faire L(s) о M, s  =f EG true For the other operators, the problem is reduced to ordinary model checking M, s  =F p o M, s |= p Л fair M,s  =F EX f o M,s |= EX (f Л fair) M, s  =F E & M, s |= E For M,s =F EG   we modify the previous algorithm, considering only SCCs with   i C n Pj 7^= 0 (that contain at least a state from each component of the fairness constraint) Formal verification 2 Marius Minea Model Checking Basics 24 - model checking CTL: 0(1 1 • (|5'| + l-R|)) (linear in size of model and formula) - CTL with fairness F: O(| |   (|S| + |Я|)   |F|) - LTL: PSPACE-complet  M    2°(l l> different type of algorithm, based on a tableau (automaton) construc-tion - CTL*: like LTL |AT|   2°(І І) CTL: often preferred due to the polynomial algorithm but also in LTL, the exponential is in the size of the formula (small) Formal verification 2 Marius Minea Marius Minea marius@cs upt ro 18 October 2017 of identifiers: where is identifier ? scope: from declaration to end of enclosing } scope: if declared outside any block also: scope (iD in function header) scope ( labels: can’t jump out) if redeclared, scope while scope in effect : how do same names in different scopes files link ? do they refer to the same object ? : same in all (files) making up program default for functions and file scope identifiers; explicit with declaration : same within one translation unit; if declared : each declaration denotes distinct object (for block scope) identifiers declared with keyword have internai linkage (are not linked to objects with same name in other files) Storage duration if declared is lifetime of program in function: local scope but preserves value between calls initialization done only once, at start of lifetime ( ) { = 0; cnt++; ( ) { ( printf( , counterO); , counterO); 0; , for variables declared with block scope lifetime: from block entry to exit; re-initialized every time : lifetime is program execution; initialized once : with malloc : for Thread local objects (since Cil) An identifier can be multiple times, only A declaration with initializer is a definition A file scope declaration with no initializer and no storage class specifier or with is a several tentative definitions for same object must match become definition by end of translation unit functions: define in one file, declare in all others variables: define in one file, declare in all others Can put declarations in a , and include where needed Symbolic model checking Binary decision diagrams Symbolic model checking Binary decision diagrams 20 oct 2005 Formal verification Lecture 3 Marius Minea Need to represent state individually => size of the state space severely iimits applicability (size of a state determines how many States we can represent in memory) - typically, limited to a few m iii ion States : for corn posed systems, state space is product of component state spaces => exponential in number of components => Much of focus in formal verification is scaling to large state spaces if reachable state set is much smaller than potential complete state space, can try to encode reached States using fewer bits ( , used in SPiN) However, this is an : on reaching an already hashed state, search stops (even though actual state may be different) => part of state space may remain unexplored => method is not sound Formal verification Lecture 3 Marius Minea Problem: corn pute set of state (EF true) - by forward traversai of graph - R: set of explored States; F: With P = 0; F = So while (F 0) choose s e F  F ^F {s}  R^R'c {s} forall s' with s s' if s' FU R F-'-FC {s'} =>Algorithm can be expressed set can be computed =>set R of reached States g s reachable from initial States starting from initial States frontler reached in current step With F = 0; F= So while (F R) Ry-RCF F = {s' e S 3se F s^ s'}    or F = F R    with test F ± 0 much easier is set of a state s in each iteration but is finite Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 4 Symbolic model checking Binary decision diagrams Symbolic model checking Binary decision diagrams • A new approach, based on exploring state - idea: a set may sometimes be represented (by a forumula) in a much more compact way than individually representing each state - need: efficient representation and manipulation for state sets and transition re lat ion [McMillan’92] - with binary decision diagrams (BDDs) [Bryant’86] • key idea 1: working with state sets - used also for infinite state sets (continuous-time or hybrid systems) • key idea 2: iterative computation until no more change => notion of Formal verification Lecture 3 Marius Minea Def: x e D is a for f : D D f (x) = x Def: A is a partially ordered set in which any finite subset has a least upper bound and a greatest lower bound Ex: powerset (set of subsets) P(S') of S, with c as order - We work with functions т : F(F) P(S) over the lattice F(F) - We regard S' c S as a predicate over S: S'(s) = true &stS' in particular: 0 = false, S = true => t : F(S) F(F) is a predicate transformer Def: • r is monotone if P C Q => r(F) C r(Q) • r is union-continuous if for any sequence Fi С P2 C we have т(ЦР0 = Цт(Р* *) • r is intersection-continuous if for any sequence Fi D P2 D we have т(п,Р,) = п,т(Р,) Formal verification Lecture 3 Marius Minea A monotone predicate transformer over P(S') always has - a minimal fixpoint, denoted pZ r(Z) - and a maximal fixpoint, denoted i'Z r(Z') [Tarski] if S is finite and т is monotone, then т is continuous for union and intersection r monotone => F(False') C p+1(Fa se) si тг(Тгие) D ri+1(True) if t is monotone and S is finite, there exist i,j > 0 such that   k > i, Tk(False) = F(False') and VA: > j, rk(True) = тЭ(Тгиё) if г is monotone and S is finite, there exist i,j>0 such that pZ r(Z) = F(False') and zvZ r(Z) = т^Тгие) Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams Symbolic model checking Binary decision diagrams Symbolic model checking Binary decision diagrams function Lfp(r : Trans) : Pred Q := False', Q'- = t(Q); while (Qz 7^ Q) do Q = Q'; Q'  = r(Q)', return Q; function Gfp(r : Trans) : Pred q := True-, Q':=r(Q); while ( improvements: reduced sums of products, factorizations, etc - stih exponentiale for some common functions (e g parity) • some elementary operations may lead to exponential growth (e g , negation) • for non-canonical re prese ntat io ns it is difficult to test: - equivalence (checking needed after changes in circuit design) - satisfiability: Зжі, • •   f ( i, • • • ,xn) = 1 ? Vt fiO) = f2O) = -Эт fiO)   f2O) = 1 • terminal nodes: function value (0 or 1) • nonterminal nodes: variables • branches (children): low(y) (left)   high(y) (right): correspond to assignment of 0 or 1 for the variable in the node BDDs: obtained from binary decision trees applying 3 reduction rules Formal verification Lecture 3 Marius Minea Formal verification Lecture 3 Marius Minea Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams Symbolic model checking Binary decision diagrams 14 Symbolic model checking Binary decision diagrams  (ni) = f(n2) => merge ni and Formal verification Lecture 3 Marius Minea Formal verification Lecture 3 Marius Minea low(n) = high(n) => eliminates testing at node n Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams Symbolic model checking Binary decision diagrams Symbolic model checking Binary decision diagrams The 3 rules can be applied whatever the variable ordering down the tree in an BDD (OBDD): one additional condition: On all paths from root to terminals, variables appear in same order (there exists a global ordering of variables) Theorem: For any Boolean function, its re prese ntat ion as an BDD, reduced according to rules 1-3 is up to isomorphism => representation => equivalence or satisfiability checking in constant time Note: A subgraph rooted as a BDD node is also a BDD => BDDs for several functions may share subgraphs in the same forest Consider the function: (ai л bi) v (a2 л 62) v (аз A 63) Exponential growth: 2n ^1 Linear growth: 2(n+l) function Apply(J, g : OBDD, op : Operator) : OBDD if isJeaf(f)    isJeaf(g) return op(J, д'); elsif (f,g,op,h) in apply cache return h; else x := topvar(f)    variable at root of f у := topvar(g) if (prd(x) = ord(y))    x = у = same variable h  = find bdd(x,Apply(f |ж=0,д  x=o,op),Apply(J |ж=і,д |x=1Tp))    find bdd creates a new BDD if not already existent elsif (ord(x) pointers into a graph with unique root • Memory management: reference counter and garbage collection • Many optimizations and heuristics - memory layout and traversai for efficient caching - parallel and distributed algorithms, etc Formal verification Lecture 3 Formal verification Lecture 3 Marius Minea Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 22 Symbolic model checking Binary decision diagrams 23 Symbolic model checking Binary decision diagrams • Variable ordering is criticai for BDD size • Functions exist with exponential size BDDs regardless of ordering (e g , middle bit of a multiplier [Bryant’91]) • shape and size of BDDs evolves during computation => variable reordering is important - transparent for verification algorithms constructed on top - reordering adjacent levels does not change pointers into BDD foo foi fio fu Marius Minea • choice of other decompositions for Boolean functions: - OBDD: Boole-Shannon decomposition f = xAf |ш=0 VsA  ж=і= x Л  д; V ж А  ж - f = fx x A fsx Reed-Muller decomposition - f = fx & x A fsx positive Davio decom position • Multiterminal BDDs: allow arbitrary terminal nodes (typically inte-gers) • BDDs for arithmetic representations: f = xq + 2 * si + 4 * s2 + • Mainly: CAD (equivalence checking) and formal verification • Compact representations for data with some regularities repetitions, but difficult to express analytically: - coding theory, large data structures, indexing, computational bi-ology Formal verification Lecture 3 Marius Minea System represented as binary encoding for States and atomic proposi-tions => use BDDs for state sets, transition reiat ion Cbeck(p) = {s e S | p e L(s)} bddjf then else(p, 1,0) Check(-yf') = S   Check(f) bdd not Check(f Ag) = Check(f) n Check(g) bdd and Check(EXf) = CheckEX(Check(J)) CheckEX(J(y)) = 3F' [f(v’) A R(y,v’)  RelProd(f, R,vf) Check(E[fUg]) = CheckEU(Check(f), Check(g)) E [fi u f2] = pz f2 V (Л л EX Z) algorithm Lfp Check(EG f) = CheckEG(Check(f)) EG f = vZ f AEXZ algorithm Gfp Formal verification Lecture 3 Symbolic model checking Binary decision diagrams 25 Symbolic model checking Binary decision diagrams 26 Symbolic model checking Binary decision diagrams Monolithic transition relation - grows - can become major obstacle in building system model to fit in memory • disjunctive partitioning (asynchronous systems) R(y, v') = (у, v') V • • • V Rn(y, v') because of distributivity 3v'[f(v')  R(v,v'y] = = Bv'[f(y'') Л R±(y, г>7)] V • • • V Зѵ [ (ѵ ) Л Rn(y,vf')] • conjunctive partitioning (for synchronous systems) 3 does not distribute over л, but may exploit locality (if Ri does not depend on all next-state variables v'): R(y,v,s) = R (y,v' ) Л • • • Л Яп(г7, Зг7,[ (г7 ) Л Л(г>,г>7)] = = ' Bv^lfty') Л i?oG’,*’i] л Д1(г7,г 1)] • • • Л Рп(у,ѵ'п)] (perform conjunction and quantification successively for each component) Recall: fairness constraint is : F = {Pi,P2, -     ,Pn}, with Pi Q S EG f is true in the maximal set Z such that: -all States of Z satisfy f - YPk e F,s e Z there is a path from s to a state of Z n Pk (passing only through States that satisfy  ) => can be expressed as fixpoint and thus computed symbolically EG fairf = • f Л A?=i EX E [f U (Z A Pk)] Likewise for the other fundamental operators: EX falr f = EX (f A fair) EU fa r 1 Naive translation is inefficient, recomputes same terms => values computed and still needed become parameters ( , ) { n > 0 ? fib3(n - 1, last + prev, last) : last; fib3(n, 0, 1); } ( ) { n can be stored in a byte (CHAR BiT > 8 bits) char can be or They are (at least -128 to 127) (at least 0 to 255) Both are included in are written between (single) ’ ’ , to in expressions Digits, lowercase letters and uppercase letters are == + 7 - == 5 - == 4 + 5 Escape sequences (textual representation) for special chars:  0’ nuli ’ n’ newline  a’ alarm ’ r’ carriage return  b’ backspace ’ f ’ form feed  t ’ tab single quote  v ’ vertical tab ’W backslash , in stdio,h : ( ); (sample use): putchar(’7’) an unsigned char (given as int); returns its value, or (constant -1) on error ( ) ( ); putchar( ); putchar(getchar()); 0; } (stored in one byte) is just another way of writing 65 ( (n >= 10) prininat(n 10); putchar( + n 7 10); ( ) (312); , in stdio,h : ( ); (use): getcharO without parameters, but with () Returns an converted to or the value EOF (negative int, usually -1) if no char could be read (e g , at end-of-file) The character read is from input (no longer available); next caii to getcharO returns character (not the same!) getcharO needs to return , not to also include EOF (negative, different from any ) When typing, characters are echoed, and placed in a buffer They are available to getcharO only after typing Enter WARNiNG! We have NO CONTROL over input data! program must (check) them, and handle errors Pure computation has no other effect: this program prints nothing! ( ) { x * x; } ( ) { sqr(2); } Repeatedly calling the (in mathematics, or examples sqr, pwr, etc ) with the same parameters gives the Output (printf) produces a (and irreversible) input with getcharO returns a different character on each caii; the character is A is a change in the state of the execution environment e g , Careful when using functions that have side effects, since they can interact (unexpectedly) through these effects write functions whenever possible! Digits are included one by one in the result: assume 1475 result next char 0 1 14 147 1475 if current (partial) number is r and value of next digit is d, next number is r' = 10 * r + d : Define a function that computes the number from the already read part r and the current character c: - when the char read is not a digit, return accumulated number r - else, recursive caii with 10*r + c - and next char read getcharO returns the (e g ASCii), NOT the when typing 6, getcharO does NOT return 6, but => we adjust subtracting : 6 is ctype h has declarations of functions for classifying characters: isalpha, isalmim, isdigit, isspace, islower, isupper, etc They take a character as parameter and return true (nonzero) or false (zero) (the character is of the stated type, or not) Also: case mapping: tolower, toupper (return transformed value) Redefined problem: Define a function that computes the number from the already read part r and the current digit c: ( , ) { isdigit(c) ? readnat rc(c- + 10*r, getcharO) : The new char read is passed as argument to the next recursive caii initially, we start from number 0 and the first character read: ( ) { readnat rc(O, getcharO); } Note: no error checking; consumes first character that is not a digit So far, we’ve written functions that work with their parameters Parameters are at caii time to the values of the arguments Sometimes, we repeatedly need to work with values that are obtained a function => need to also bind these to a name We a (local) and it with a value This is , we still don’t need to the value! readnat can read the char c rather than get it as parameter: ( ) = getcharO ; isdigit(c) ? readnat r(c- + 10*r) : r; ( ) { readnat r(0); } Exercise: trace by hand the calls if the input is 143 Reading a number stops at the first non-digit (or EOF) That char is not part of the number and should not be consumed ( , FiLE *stream); puts a character c back into a given input stream (file) For now, we use standard input: ungetc(c, stdin) ( ) { = getcharO ; (isdigit(c)) readnat r(c - + 10*r)); { ungetc(c, stdin); r; } ( ) { readnat r(0); } We could also use as sequencing operator for expressions: isdigit(c) ? readnat r(c - + 10*r) : (ungetc(c, stdin), r); The expression before the comma is , its value is the value of the entire expression is that of the We now read an integer, with an optional sign ( ) = getcharO ; c == ? - readnatO : c == ? readnatO : (ungetc(c, stdin), readnatO); if c is not a sign, it may be the first digit of the number ungetc(c, stdin) puts c back into standard input it will be returned again on the next read, e g with getcharO Our readnat does not handle errors (on non-digit, it returns 0) We could return a special value (e g iNT MiN), and have the caller check (this would reduce the range of useful values by 1) How do we evaluate this expression? 3*8 -(7-2) *4 + 2*3 Find and mark the operation to perform 3*8 — (7 — 2) *4 exprl 2*3 expr2 We’ve made the explicit! instead of highlighting, we can + exprl expr2 just like function notation: add(exprl, expr2) + exprl expr2 instead of exprl + expr2 We’ve just defined {num + expr expr — expr expr * expr expr implicit in definition: on the right is in the same format With the operator always first, no need for parantheses! prefix * 3 8 *-7 2 4 usual (infix) notation 3*8 (7-2) *4 — *38* —724 3*8 -(7-2) *4 We’ll transcribe the rules into code: an expression that is num + expr expr — expr expr * expr expr expr will be a : read input, return a value expr on right-hand side: recursive calls for subexpressions expr = num + expr expr — expr expr * expr expr First character decides which definition variant (branch) to follow read first character (non-whitespace) if digit read number, return value else if known operator expression expression apply operator, return value Code directly follows the structure of the definition! The in the statement or the operator is usually a , with a : x != 0, n condition must have type (integer, floating point, enum) (== ! = suitable for direct use as conditions Library functions often return zero or nonzero (NOT zero or one!) only compare if (isdigit(c)) (nonzero), don’t compare to 1 ! With logical operators, we can write complex decisions: expr ! expr ei && e? 0 e2 7^0 ei 1 1 e2 0 e2 + o 0 1 ei 0 0 0 ei 0 0 1 ^0 0 7^0 0 1 7^0 1 1 AND conjunction negation NOT disjunction OR Reminder: logical operators produce for , for An integer is interpreted as if , and as if Years divisible by 4 are leap years those divisible by 100 which are not those divisibile by 400 which still are Can’t directly translate like this (can’t write exception case after normal case already handled) => need to reverse order: ( ) (yr 7 400 == 0) 1; (yr % 100 == 0) 0; yr 7 4 == 0; } What test order for fewest checks on average? (years equally likely) A year is a leap year if it is divisible by 4 and it is not divisible by 100 or it is divisible by 400 ( ) yr ’  ' 4 == 0 && ( ! (yr °i ' 100 == 0) ii yr 7, 400 == 0) ; ! (yr % 100 == 0) is equivalent with (yr % 100 != 0) The ! (logical negation): highest precedence if (!found) same as if (found == 0) (zero is false) if (found) same as if (found != 0) (nonzero is true) : lower precedence than arithmetic ones we can naturally write x >= = H—c instead of Avoid side-effects in compound tests (or place them first) and are different notions! 2 * f (x) + g(x) : multiplication before addition (precedence) which part of sum is evaluated first (f or g) A is an obiect with a name and a type it Stores values (other than function arguments) needed later : for values given to the function (by the caller) : for (auxiliary) values computed in the function : for one or more variables of the same type' double x; int a = 1, b, c; a is initialized with 1, the other variables are not Variables declared locally in a block (function) are by default! When we declare a variable, we should know why we need it => good practice to it immediately with the needed value A function body { } is a sequence of declarations and statements since C99, declarations and statements can appear in any order (in previous standards: first all declarations, then statements) The of an identifier (e g , variable) is the program region where it is (can be used) Function have the function are the function Thus, parameter names for different functions do not conflict like in mathematics, we can have f(x) = and g(x) = same for local variables The or of an object (e g , variable) is the part of program execution during which storage is reserved for it Local variables have storage duration: they are automatically created on each caii and (they do not exist between calls, thus do not preserve their value) in recursive functions we don’t need to change variable values a programming style typical for (pure) Recursive calls create with in , we use: to represent objects used in solving the problem (current character; partial result; number left to process) , to give a to a variable (to express a computation step in the program) : variable expression Everything is an : 1 The expression is evaluated 2 the value is to the variable and becomes the value of the entire expression Example: c = getcharO n = n-1 r = r * n May appear in other expressions: if ((c = getcharO) != EOF) May be chained: a = b = x+3 a and b get the same value Any (function caii, assignment) with is a printf ("hello"); c = getcharO; x = x + 1; A variable changes value i NOT in other expressions, or by passing as parameter! n + 1 sqr(x) toupper(c) , DON’T change! n = n + 1 x = sqr(x) c = toupper(c) assignment comparison Recall: reversing a number rev( n , rev(46 , rev( 4 , rev( rev( O , r ) empty(O) ) ) 5 ) 56 ) We have done repeated processing through on each caii, for n 10, 10*r+n° "10 Controls repetition (caii) or termination (base case) n == 0 one for each repetitive computation Expresses the repetition of a statement, guarded by a condition: expression statement 11! Expression must be between parantheses : evaluate expression if it is true (nonzero): (1) execute statement (loop body) (2) go back to start of (evaluate expression) Else (if condition is false zero), don’t execute anything => body executes repeatedly, as long as (while) condition is true We can define iteration (the while loop) recursively: expression statement is the same as expression { statement expression statement } Recursion is fundamental it can express any iteration ) { n > О ? fact r(n - 1, г * n) : г; ( n > О ? pow r(x, n-1, : r; x*r) ( = 1; (n > о) { г = г * n; n = n - 1; } r; } ( = 1; (n > o) { r = x * r; n = n - 1; r; } ) { ) { Easier if function is written by accumulating a partial result ( ) Stop test and initial result value are the same as in recursion Recursion creates of parameters for each recursive caii, with new values dependent on the old ones: ex n * r, n - 1, x * r, etc iteration values to variables in each iteration, using the same rules expressions Ex r = n*r, n = n - 1, r = x*r Both variants return the accumulated result Recursion and iteration both repeat a processing step => in a problem we use one or the other, rarely both ( { = 0; > (isdigit(c = r = 10*r + c -ungetc(c, stdin); } r; ( ( 0; ) { } = getcharO)) , readnatO); Reading all input, doing nothing (body of while is empty, is the Do not write after ( ) { ((c = getcharO) ! = 0; } Reading and printing all input: ((c = getcharO) ! = putchar(c); u ni ntentional ly! = EOF); = EOF) ((с = getcharO) != EOF) (с == ) puts( ); Often, we search for more text of some sort after that character looking for the first word: ( ) { ((c = getcharO) != EOF) (c == ) { (isspace(c = getcharO)); (isalpha(c = getcharO)) putchar(c); } } 0; Function that reads and prints up to a specified character returns that character or EOF if reached before that char ( ) ((c = getcharO) != EOF && c ! = stopchar) putchar(c); c; (c=getchar()) ! =E0F (assign, then compare) ( ) ((c = getcharO) != EOF && c ! = stopchar); c; after means an empty loop body char с = getcharO; YES: = getcharO; if is , c will never compare equal to EOF (-1) will never leave a (c != EOF) loop if is , reading byte 255 becomes -1 (EOF) a valid char (code 255) will be taken as EOF (early stop) -(! EOF) EOF is a nonzero constant (-1) thus the condition is always false, the loop is never entered! YES: ((c = getcharO) != EOF) and careful with the О i -(c = getcharO—!= EOF) ! = has higher precedence, its result (o or 1) is assigned to c -= getcharO ; -(c may loop forever! YES: -((c = getcharO) !=-&& c != EOF) will exit! statement expression Sometimes we know that a cycle needs to be executed at least once (we read at least one character, a number has at least one digit) Like the while loop, executes statement as long as the expression evaluates to true (nonzero) Expression is (re)evaluated every iteration statement Equivalent with: ( expression ) statement We should consider: what variable changes in each iteration ? what is the loop continuation stopping condition ? Don’t forget update of variable that Controls loop (otherwise will loop forever) What do we know on exiting the loop ? The loop condition is we consider this as we reason further about the program We inspect check test the program: mentally, running it "pencil and paper" on simple cases then with increasingly complex tests, including corner cases Marius Minea 9 October 2017 A simple definition (first programming course): "Type = set of values together with some operations on that set" A trivial error (at the ML prompt): # (+) 3 (fun x -> x);; Error: This expression should not be a function, the expected type is int => Some (syntactically correct) programs do not make sense Genera i izi ng: "A is any property of a program that we can establish without executing the program" Krishnamurthi, PLAi book a mechanism for distinguishing good programs from bad (informally) "A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute" Pierce (+) 1 (if unknown then 3 else (fun x -> x)) would run OK if unknown is true would give an error otherwise => can’t (always) decide Туре systems are always prey to the Halting Problem => a type system for a general-purpose language must always either over- or under-approximate: either must accept programs that will error when executed or must reject programs that might have run without error Krishnamurthi, PLAi [Cardelli and Wegner, On Understanding Types, Data Abstraction and Polymorphism] The following are universes bit-strings in computer memory everything represented as bit-strings => untyped only one type S-expressions in (pure) Lisp no distinction between program and data but: some structure (more than bit-strings) A-expressions in the A-calculus everything is a function (numbers, booleans, if-then-else) Sets in set theory everything is an element or a set (can encode mathematics ) Bitstrings can represent operations or characters, integers, Some S-expressions are lists, others are LiSP programs Some A— expressions (functions) represent booleans, or integers Some sets may denote ordered pairs, leading to functions => Can think of universes as But this is an illusion unless there is some means to enforce it Typing may be: (types part of syntax, e g all variables typed) (can be reconstructed: type inference) Types avoid problems related to exposing internai representation Types impose constraints which help to enforce correciness Types avoid logical inconsistencies ("set of all sets") Types prevent inconsistent interactions between objects "A type may be viewed as a set of clothes (or a suit of armor) that protects an underlying untyped representation from arbitrary or unintended use " Cardelli Wegner "Violating the type system involves removing the protective set of clothing and operating directly on the naked representation " A program might have: [Cardelli, Type Systems] : cause computation to stop : may go unnoticed A program (fragment) is if it does not cause untrapped errors A language: all program fragments are safe But, we want more no untrapped errors no trapped errors that we consider programmer must avoid other trapped errors type of every expression can be determined by static analysis at compile-time, e g, ML, Java, Pascal (partly unsafe) well-typed programs are well-behaved (conservatively) Languages in which all expressions are type-consistent although type itself may be statically unknown can be done by introducing some run-time type checking Static implies strong typing, but strong typing could be dynamic (weak checking) some unsafe operations detected Pascal: untagged variants and function parameters unsafe Modula-3: separates safe unsafe modules Strachey (1967) defines: - : function works uniformly on a range of types (with some common structure) - : function works on several different types (may not have common structure), may behave in unrelated ways Refined classification [Cardelli and Wegner]: Polymorphism universal ad-hoc parametric inclusion overloading coercion Use of a single abstraction across different types e g list abstraction 'a list subtyping and inheritance different functions with same name; context used to make decision could view as syntactic abbreviation handled by preprocessing e g multiple methods with same name, if signatures are distinct semantic operation, converts a type to that expected by a function (otherwise type error would occur) can be done statically or dynamically Distinction blurred at times Discuss: 3 + 4 3 0 + 4 3 + 4 0 3 0 + 4 0 Overloading integer constants may have both type int and real, purely syntactic Coercion: an integer value can be used where a real is expected conversions inferred at compile time (or even runtime: LiSP) Subtyping: elements of subrange type also belong to supertype Value sharing: nil constant shared by all the pointer types (Pascal) example of parametric polymorphism Coercion: a single abstraction serves several types through implicit type conversion Overloading: a single identifier denotes several abstractions Parametric: an abstraction operates uniformly across different types inclusion: an abstraction operates through an inclusion relation [Wm Paul Rogers, Reveal the magic behind subtype polymorphism, JavaWorld, 2001] "when i see a bird that walks like a duck and swims like a duck and quacks like a duck, i caii that bird a duck " a form of dynamic typing concerned with just the aspects of an object that are used, rather than the type of the object itself (entire interface) Offers more freedom (polymorphism without inheritance) Does not define an explicit interface Can result in semantic unintended behavior Consider a lambda-calculus with integer constants Expressions are defined recursively as: E := n | x | Xx E | E E Types are defined recursively as: T ::= int | a | T —> T where a is a (still unbound type) Type rules impose constraints (matching) on types, e g t(ei ег) = т => t(ei) = т Datastructure + algorithm for working with equivalence classes Operations: (element): finds representative of equivalence class (eleml, elem2): declares elements to be equivalent implementation: forest of with links up to parent : returns tree root (node itself, it standalone) : links root of one tree to the other z Y Z X find(X) = = find(Z) find(Y) = Z union(Y, S) links find(Y) and find(S) Unification leads to a type variable a can be substituted with any type term that does a (no recursive types) in ei 4, if t(ei) = a, then a = int —> (3 ((3 = new type var) Keep a mapping subst from type variables to type terms To unify two types, ti and t?- let ri = subst*(ti) and = subst*(t2) (recursively apply subst to any variables in substitutions) unify и and Г2 (may change the subst mapping) a type variable can be unified with any type term that does not contain it (including another type variable) a type constructor can be unified with the same constructor, by pairwise unifying arguments —> is a binary constructor, int is 0-ary (type constant) else fail, e g , unify —> with int 11 October 2017 Tests are generated based on of code Other (better) names: glass box, clear box, open box Another classification: testing (black-box)   (white-box) Comparison: - black-box: at any level   white-box: mostly module unit testing - white-box: code change => tests change - white-box: easier detection of but cannot detect (in code or spec) (CFG) graph representation of program and implicitly its execution paths nodes = instructions edges (labeled w conditions): sequencing between instructions x:=a + b; у := a*b; while (y > a) { a := a + 1; x := a + b Usually, straight-line code is grouped together => a sequence of statements with just one entry and one exit point (no jumps into middle of code, or from code outside) image: https:  vinaytech wordpress com 2008 10 04 abstract-syntax-tree  = a criterion to measure if a set of tests is What good are such criteria ? For questions as: What program properties should we examine ? What test data do we select for such properties ? What objectives do we set for testing ? Did we test enough ? Burnstein, Practicai Software Testing => i e , through the CFG But: number of program paths usually infinite (loops, recursion also: one path, multiple data (proper equivalence classes?) => must choose modest structural criteria => but not arbitrary - chosen Antiextensionality: There are equivalent programs P si Q such that a test suite T is adequate for P but not for Q i e equivalent programs may need different test suites General Multiple Change: There are programs P and Q that have the same form (structure) and a test suite T which is adequate for P but not for Q i e close programs may need different test suites also: statement coverage, basic block coverage Sufficient tests to execute each program statement Obviously a necessary criterion (not executed = not tested) obviously also insufficient a , *s = NULL; (len a case where branch coverage does not subsume line coverage A condition is an in a decision needs tests for each possible value of a condition apparently more complex than decision coverage, but does not subsume it Example (x > 5 && у == 3) Two tests: x = 6, у = 2 and x = 4, у = 3 generate all possible condition values ( T and F, F and T) but follow the same branch (false) Simultaneously covers criteria May need more tests than individual methods or just recombining them Example (x > 5 && у == 3) two tests are still enough: x = 6, у = 3, and x = 4, у = 2 May be insufficient: the effect of some conditions may others Tests all combinations for the (conditions) of the decision Exponential in number of conditions (2n tests for n conditions) => often too expensive to implement in pratice, some of the 2" combinations - may be irrelevant (for short-circuit evaluation) - may be infeasible (when conditions are not independent) => in general, this requirement is not justified One of the strongest criteria; initially developed at Boeing is a requirement in avionics safety-critical systems (standard DO-178B) Complete requirements for an MC DC test suite: All program entry and exit points covered Each decision exercised on both branches Each condition takes both values Each condition is shown to affect its enclosing decision (keep other conditions fixed, varying condition of interest) Same tests, whether language has short-circuit evaluation or not Start from base cases && and i i with two conditions AND operator && has a single case (t t) with result t Changing any condition to f, result becomes f Likewise for | | (dual operator), switching t and f aba&feb aballb t f (1) t f (2) t (3) f t f t f a: (1, 3) b: (2, 3) We indicate the pair of tests relevant for each condition: (1) (2) (3) (1, 3) shows a may influence decision; likewise, (2, 3) for b For n conditions: a test with all the same, n tests with one each flipped a b c a && b && c t t f (1) t t f (2) t t f (3) t (4) a: (1, 4) b: (2, 4) c: (3, 4) Consider а && b && (с | | d && e) Start from innermost expression(s), d && e (watch precedence!) d e d && e t f (1) t f (2) t (3) d: (1, 3) e: (2, 3) We then add с | | Since i i with f does not change truth, add c=f to all tests (1-3) For the new test (4), choose test with f result (2) and add c=t c d e с 1 1 d && e f t f t f f t t f t (1) Now also shows effect of c: (2) c: (2, 4) (3) d: (1, 3) (4) e: (2, 3) Now add a && b && То previous tests, add a=t, b=t Then choose a test with t result (4), flip in turm a and b to f, showing a and b influence decision: a b c d e a && b && (с i i d && e) F"t f t f (i) t t t f (2) t t f t (3) t f t (4) t t f f (5) t t f f (6) a: (4, 5) b: (4, 6) c: (2, 4) d: (1, 3) e: (2, 3) Each test pair has one condition shown to influence outcome, all other conditions have the same value in both tests By construction, it follows that n variables need n + 1 tests Consider а && b | | c && d We write tests for both subexpressions (given by precedence) a b a && b а: (Г, 3’) c d c && d t f (1’) b: (2’, 3’) t f (1") t f (2’) c: (1", 3") t f (2") t (3’) d: (2", 3") t (3") We combine with | | Since i i with f has no effect, choose one f test from each group (1’ + 1") and combine with all tests in the other group a b c d a && b i i c && d t t f (1=1’+1") a: (1- 5) f t t f (2=l’+2") b: (4, 5) f t t (3=l’+3") c: (1, 3) t f t f (4=2’+l") d: (2, 3) f t t (5=3’+l") We have thus kept the influence of each individual condition The above analysis is valid for independent conditions it’s always possible to generate the designed tests in reality, conditions may be (correlated) Example: (z - x >= 3 && z - у >= 1 | | у = 3 influence the condition, we’d need x = 5, and z - у >= 1 But from these, we get z - x >= 3, thus the condition can’t be false, and can’t influence the decision! => trying to get MC DC coverage, we can detect if a condition is written needlessly complex, or has irrelevant parts (a possible logic error) in this case, since z - x = 3 to true: (z - у >= 1 || у e g combinations of successive if statements in the program => we need a criterion closer to path coverage (which would cover all execution paths) Approach: identify n relevant predicates (conditions) in the program Try to generate all S   2n possible combinations S States (program locations), n predicates => correlates between them all States and predicates in the program ( П, ) { = а ; = 1, hi = n-1; (Іо pivot) hi—; (Іо pivot [ Beizer, Software Testing Techniques ] For simple cycles - zero iterations (cycle is skipped) possibly also: negative counter - correct behavior? - one iteration - two iterations (may catch - one typical intermediate value - N-l iterations - N iterations - try to force N+l iterations (more than assumed max) For nonzero minimum: try min-1, min, min+1 1 minimal number of outer iterations try inner cycle completely (as independent cycle) 2 continue following cycles outwards - with inner cycle at typical iteration count - vary count for current cycle 3 finally, vary all cycles together from min to max - all paths that traverse a cycle once, without repetition (boundary test) - all paths that repeat a test, at most once (interior test) an LCSAJ sequence: straight line code followed by a jump length N LCSAJ criterion: N such consecutive sequences N = 1 ensures line coverage N = 2 ensures branch coverage (even more) Try changing decisions statements according to some patterns to detect if the program runs differently Examples: - need сору of correct value Check bytes next to (before) ret address => terminator canary: 0, CR, LF, EOF random canary (don’t know can’t put back) random XOR canary (must also know control value) Who how when implements these checks? Option: hamper execution Attacker must execute injected code: Non-executable stack   write XOR execute if you can’t execute code on stack, try something else Typical attack is to caii exec or some other library function => instead of executing code (caii exec), put address (and parameters) of libc function on stack, in place of normal ret address Which protections are effective? Can chain attacks - put multiple library addresses on stack Generalize: return-oriented programming Function pointers (denote code) pointers from longjmp pointers to user functions pointers to library functions (PLT: procedure linkage table) or usual pointers to data Attacks might be in two steps: a buffer overflow overwrites a pointer (to desired address) in later code, this is used to overwrite criticai area ret address, PLT, etc Szekeres, Payer, Wei, Song SoK: Eternal War in Memory, iEEE S&P 2013 20 Formal verification Lecture 3 OCt 2005 Marius Minea Symbolic model checking Binary decision diagrams 2 Need to represent state individually => size of the state space severely limits applicability (size of a state determines how many States we can represent in memory) - typically, limited to a few million States : for composed systems, state space is product of component state spaces => exponential in number of components => Much of focus in formal verification is scaling to large state spaces if reachable state set is much smaller than potential complete state space, can try to encode reached States using fewer bits ( , used in SPiN) However, this is an on reaching an already hashed state, search stops (even though actual state may be different) => part of state space may remain unexplored => method is not sound Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 3 Problem: compute set of States reachable from initial States (EF true) - by forward traversai of graph starting from initial States - R: set of explored States; F: frontier reached in current step With With Д = 0; F = So Я = 0; F = So while (F  = 0) while (F g R) choose s&F; R^ fiuF F s'} forall s' with s s'   or F = F R if s' g F U R    with test F 0 F - FU {s'} =>Algorithm can be expressed much easier is set of a set can be computed ^>set R of reached States grows in each iteration but is finite state Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 4 • A new approach, based on exploring state - idea: a set may sometimes be represented (by a forumula) in a much more compact way than individually representing each state - need: efficient representation and manipulation for state sets and transition relation [McMillan’92] - with binary decision diagrams (BDDs) [Bryant’86] • key idea 1: working with state sets - used also for infinite state sets (continuous-time or hybrid systems) • key idea 2: iterative computation until no more change =^notion of Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 5 Def: x e D is a for f : D D if  (ж) = x Def: A is a partially ordered set in which any finite subset has a least upper bound and a greatest lower bound Ex: powerset (set of subsets) P(S) of S, with c as order - We work with functions r : P(S) P(S) over the lattice P(S) - We regard S' c S as a predicate over S: S'(s) = true^- s e S' in particular: 0 = false, S = true => t : P(S') P(S ) is a predicate transformer Def: • t is monotone if P C Q => t(P) C r(Q) • t is union-continuous if for any sequence Fi С F2 C we have т(Ц:Р) = Ujr(Fj) • t is intersection-continuous if for any sequence Fi D P2 D we have т(ПіД) = Г т(Д) Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 6 A monotone predicate transformer over P(S) always has - a minimal fixpoint, denoted  iZ r(Z) - and a maximal fixpoint, denoted z Z r(Z) [Tarski] if S is finite and т is monotone, then т is continuous for union and intersection t monotone => тг (False) C (False) si P(True) D P^ 1(True) if t is monotone and S is finite, there exist i,j > 0 such that V c > z, rk(False) = P(False) and V c > j, тк(Тгие) = P(True) if t is monotone and S is finite, there exist i,j > 0 such that liZ r(Z) = P (False) and vZ r(Z) = P(True) Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 7 function Lfp(r : Trans) : Pred Q := False-, Q'-=r(Q); while (Q' Q) do o  = of- Q'-=r(Q); return Q; function Gfp(r : Trans) : Pred Q := True; Q':=r(Q); while (Q' Q) do o  = of- Q':=r(Q); return Q; Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 8 We identify a CTL formula f with the set of States that satisfy it: {s i M,s |= } AF  = p,Z f V AXZ AGf = vZ f A AXZ EF f = MZ f V EXZ EG f = vZ f A EX Z • A [ i U  2] = fiZ • E [ i U f2] = "Z • A[ iR 2] ="Z • E[ xR 2] =vZ   2V( iAAXZ)  2v( iaexz)  2Л( 1 VAXZ)  2A( 1VEXZ) minimal fixpoint: liveness properties: F maximal fixpoint: safety properties (invariants): G Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 9 - works by structural decomposition of the formula Check(f) returns { s- e S | M, s |= f} (set of States satisfying  ) Check(p) = {s e S   p e L^s)} atomic propositions Check^f) = S   Check(f) Check(J л g) = Check(f) n Check(g') Check(E X f) = CheckEX(Check(jy) CheckEX(f(vY) = Bv' [ (?') л R(v,  ')] Check(E [f Ug]) = CheckEU( improvements: reduced sums of products, factorizations, etc - still exponentiale for some common functions (e g parity) • some elementary operations may lead to exponential growth (e g , negation) • for non-canonical representations it is difficult to test: - equivalence (checking needed after changes in circuit design) - satisfiability: 3#i, • • • xn f (^i, • • •, яп) = 1 ? Ѵж fi (ж) =  2(ж) = -Bx fi (ж) e f2(a;) = 1 Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 12 • terminal nodes: function value (0 or 1) • nonterminal nodes: variables • branches (children): low(v) (left)   high(v) (right): correspond to assignment of 0 or 1 for the variable in the node 00010101 BDDs: obtained from binary decision trees applying 3 reduction rules Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams Formal verification Lecture 3 13 Marius Minea Symbolic model checking Binary decision diagrams  (ni) =  (П2) => merge ni and Formal verification Lecture 3 14 Marius Minea Symbolic model checking Binary decision diagrams 15 0 1 low(n) = high(ri) => eliminates testing at node n Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 16 The 3 rules can be applied whatever the variable ordering down the tree in an BDD (OBDD): one additional condition: On all paths from root to terminals, variables appear in same order (there exists a global ordering of variables) Theorem: For any Boolean function, its representation as an BDD, reduced according to rules 1-3 is up to isomorphism => representation => equivalence or satisfiability checking in constant time Note: A subgraph rooted as a BDD node is also a BDD => BDDs for several functions may share subgraphs in the same forest Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 17 Consider the function: (ai A di) V (a2 A 62) V (аз A 63) 0 1 0 1 Linear growth: 2(n+l) Exponential growth: 2n+1 Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 18 function Apply(J, g : OBDD, op : Operator) : OBDD if isdeaf(J) л isJeaf(g) return op(J, g); elsif ( , g, op, h) in apply cache return h; else x := topvar(J)    variable at root of   у := topvar(g) if (ord(x) = ord(y))    x = у = same variable h := find bdd(x, Apply(f  x=0,g  x=o,op),App!y{f |ж=і,g  x=i,op))    find bdd creates a new BDD if not already existent elsif (ord(x) pointers into a graph with unique root • Memory management: reference counter and garbage collection • Many optimizations and heuristics - memory layout and traversai for efficient caching - parallel and distributed algorithms, etc Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 22 • Variable ordering is criticai for BDD size • Functions exist with exponential size BDDs regardless of ordering (e g , middle bit of a multiplier [Bryant’91]) • shape and size of BDDs evolves during computation => variable reordering is important - transparent for verification algorithms constructed on top - reordering adjacent levels does not change pointers into BDD  00  01  10  11  00  01  10  11 Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 23 • choice of other decompositions for Boolean functions: - OBDD: Boole-Shannon decomposition f = x  f |ж=р ѴжЛ  |ж=і= x A fx V х Л fx - f = fx Ф x л fsx Reed-Muller decomposition f = fx   x л fsx positive Davio decomposition • Multiterminal BDDs: allow arbitrary terminal nodes (typically inte-gers) • BDDs for arithmetic representations: f = xq + 2 * x± + 4 * ж2 + ••• • Mainly: CAD (equivalence checking) and formal verification • Compact representations for data with some regularities repetitions, but difficult to express analytically: - coding theory, large data structures, indexing, computational bi-ology Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 24 System represented as binary encoding for States and atomic proposi-tions => use BDDs for state sets, transition relation Check(p) = { s- G S | p G L(s)} Check(-if) = S   Check(f) Check(J л j) = Check(f) n Check(g') Check(E X f) = CheckEX(Check(jy) CheckEX(f(v)) = Эѵ' [ (t ) л R(v, ?')] bddJf then else(p, 1,0) bdd not bdd and RelProd(f, R, v') Check(E [f U ) A R(v, U7)] = = A Ri(v, i )] V • • • V Bvf[f(vf') A г )] • conjunctive partitioning (for synchronous systems) 3 does not distribute over A, but may exploit locality (if Ri does not depend on all next-state variables v'y R(v vr} = Ei(i?,i7 1) Л • • • Л Rn{v vrnJ 3^z[ (^z) A R(v, г )] = = 3^[-   [ (?') Л Л       Л Яп(гУ,г4)] (perform conjunction and quantification successively for each component) Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 26 Recall: fairness constraint is : F = {Fi, P-^,   •  , Pn}, with Fj C S EG f is true in the maximal set Z such that: -aii States of Z satisfy f - VPfc e F,s e Z there is a path from s to a state of Z n P^ (passing only through States that satisfy  ) => can be expressed as fixpoint and thus computed symbolically EG fairf = vZ fb Л?=і EX E [f U (Z Л Ffe)] Likewise for the other fundamental operators: Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 27 Main advantages of model checking: - completely automated - generates counterexamples that identify errors • for existential formulas (E) : produces a the formula is true path for which • for universal formulas (A): produces a counterexample • counterexample for a universal formula is withess for its negation (its dual existential formula) Formal verification Lecture 3 Marius Minea Symbolic model checking Binary decision diagrams 28 - minimal fixpoint: EF f =  iZ f V EXZ - compute and retain successive approximations f = Qq C Q1 C C Qk - Qk: set of States from which f can be reached in at most к steps - find intersection n Sq Ф 0 (first traversai: backwards, symbolic) - choose sk e Sq n Qk - compute set Succ{sk) of successors for sk - must have nonempty intersection Qk i (from sk f is reachable in at most к steps, so there is a successor reaching it in к - 1 steps) - choose sk 1 e Succ(sk} n Qk  , etc until Qo = f (second traversai, forward, through individual States) - we have found path sk sq reaching f Formal verification Lecture 3 Marius Minea Marius Minea 5 October 2017 Aleph One, Smashing the stack for fun and profit, Phrack magazine 7(49) Overflow any stack-placed buffer accepting unchecked input unsafe functions: strcpy, strcat, scanf with %s gets: from C standard in 2011 safe alternatives introduced for some Danger not limited to unsafe input also careless overflow of index in (local) array Reason: low abstraction level of C no objects carrying size info (that could be checked) can create arbitrary pointer values using pointer arithmetic checks are responsibility of user, not of runtime system void func (char *str) { char buffer ; int variable a; strcpy (buffer, str); int main() { char *str = "i am greater than 12 bytes"; func (str); (b) Active Stack Frame in func() (a) A code example http:  www cis syr edu  wedu seed Labs 12 04 Software Buffer OverflowZBuffer Overflow pdf buffer[O] e buffer[l] V buffer i buffer 1 4 buffer p buffer[ll] a prev frame ptr У str (fct arg ) o nxt stack frame a d 4 return address slot overwritten on function return, execution jumps wherever that points to For exploit, must know: 1) position of return address slot to buffer start: i e , buffer size and stack layout (calling convention) 2) memory address of buffer (to fiii in proper payload address) (a) Jump to the malicious code (b) improve the chance http:  www cis syr edu  wedu seed Labs 12 04 Software Buffer OverflowZBuffer Overflow pdf Let’s revisit exploit assumptions: can determine to inject payload ( can return address tampering is can payload code Option: make it difficult to find attack point (address) Attacker must know to jump to: Address Space Layout Randomization What flexibility does the attacker code have? is attack still realistic? For 32-bit vs 64-bit ? Option: detect change check if RET address altered before function return Two basic ideas: Option: detect change check if RET address altered before function return Two basic ideas: Check return address itself => need сору of correct value Check bytes next to (before) ret address => terminator canary: 0, CR, LF, EOF random canary (don’t know can’t put back) random XOR canary (must also know control value) Who how when implements these checks? Option: hamper execution Attacker must execute injected code: Non-executable stack   write XOR execute if you can’t execute code on stack, try something else Typical attack is to caii exec or some other library function => instead of executing code (caii exec), put address (and parameters) of libc function on stack, in place of normal ret address Which protections are effective? Can chain attacks - put multiple library addresses on stack Generalize: return-oriented programming Function pointers (denote code) pointers from longjmp pointers to user functions pointers to library functions (PLT: procedure linkage table) or usual pointers to data Attacks might be in two steps: a buffer overflow overwrites a pointer (to desired address) in later code, this is used to overwrite criticai area ret address, PLT, etc Szekeres, Payer, Wei, Song SoK: Eternal War in Memory, iEEE S&P 2013 Marius Minea marius@cs upt ro 10 October 2016 Copy-paste is bad! Any later change must be done twice Code used multiple times should be put in a Example: median of three numbers After comparing two numbers, we know smaller and larger one: need same computation, with switched numbers function! ( ) { x 1 Naive translation is inefficient, recomputes same terms => values computed and still needed become parameters ( , ) { n > 0 ? fib3(n - 1, last + prev, last) : last; fib3(n, 0, 1); } ( ) { n recursive solution given (parameters): x and the current approximation result = a satisfactory approximation (precision e) Re-state problem: corn pute Jx an 1 Computation: if precision good |an+i — an  ( , ) { fabs(a n - x a n) 0) : Define a function that computes the number from the already read part r and the current digit c: - when the char read is not a digit, return accumulated number r - else, recursive caii with 10 • r + c, reading next character getcharO returns the character code (e g ASCii), NOT the value of the digit when typing 6, getcharO does NOT return 6, but ’6’ => we adjust with -’O’: 6 == ’6’ - ’0’ ctype h has declarations of functions for classifying characters: isalpha, isalmim, isdigit, isspace, islower, isupper, etc They take a character as parameter and return true (nonzero) or false (zero) (the character is of the stated type, or not) Redefined problem: Define a function that computes the number from the already read part r and the current digit c: ( , ) { isdigit(c) ? readnat rc(10*r+(c- ), getcharO) : The new char read is passed as argument to the next recursive caii initially, we start from number 0 and the first character read: ( ) { readnat rc(O, getcharO); } Note: no error checking; consumes first character that is not a digit So far, we’ve written functions that work with their parameters Parameters are at caii time to the values of the arguments Sometimes, we repeatedly need to work with values that are obtained a function => need to also bind these to a name We a (local) and it with a value This is , we still don’t need to the value! readnat can read the char c rather than get it as parameter: ( ) = getcharO; isdigit(c) ? readnat r(10*r + (c- )) : r; ( ) { readnat r(0); } Reading a number stops at the first non-digit (or EOF) That char is not part of the number and should not be consumed ( , FiLE *stream); puts a character c back into a given input stream (file) For now, we use standard input: ungetc(c, stdin) ( ) { = getcharO ; (isdigit(c)) readnat r(10*r + (c- )); { ungetc(c, stdin); r; } ( ) { readnat r(0); } We could also use as sequencing operator for expressions: isdigit(c) ? readnat r(10*r + (c- )) : (ungetc(c, stdin), r); The expression before the comma is , its value is the value of the entire expression is that of the We now read an integer, with an optional sign ( ) = getcharO ; c == ? - readnatO : c == ? readnatO : (ungetc(c, stdin), readnatO); if c is not a sign, it may be the first digit of the number ungetc(c, stdin) puts c back into standard input it will be returned again on the next read, e g with getcharO A is an obiect with a name and a type it Stores values (other than function arguments) needed later : for values given to the function (by the caller) : for (auxiliary) values computed in the function : for one or more variables of the same type' double x; int a = 1, b, c; a is initialized with 1, the other variables are not Variables declared locally in a block (function) are by default! When we declare a variable, we should know why we need it => good practice to it immediately with the needed value A function body { } is a sequence of declarations and statements since C99, declarations and statements can appear in any order (in previous standards: first all declarations, then statements) The of an identifier (e g , variable) is the program region where it is (can be used) Function have the function are the function Thus, parameter names for different functions do not conflict like in mathematics, we can have f(x) = and g(x) = same for local variables The or of an object (e g , variable) is the part of program execution during which storage is reserved for it Local variables have storage duration: they are automatically created on each caii and (they do not exist between calls, thus do not preserve their value) The in the statement or the operator is usually a , with a : x != 0, n condition must have type (integer, floating point, enum) (== ! = suitable for direct use as conditions Library functions often return zero or nonzero (NOT zero or one!) only compare if (isdigit(c)) (nonzero), don’t compare to 1 i With logical operators, we can write complex decisions: expr ! expr ei && e? 0 e2 7^0 ei 1 1 e2 0 e2 + o 0 1 ei 0 0 0 ei 0 0 1 ^0 0 7^0 0 1 7^0 1 1 AND conjunction negation NOT disjunction OR Reminder: logical operators produce for , for An integer is interpreted as if , and as if Years divisible by 4 are leap years those divisible by 100 which are not those divisibile by 400 which still are Can’t directly translate like this (can’t write exception case after normal case already handled) => need to reverse order: ( ) (yr 7 400 == 0) 1; (yr % 100 == 0) 0; yr 7 4 == 0; } A year is a leap year if it is divisible by 4 and it is not divisible by 100 or it is divisible by 400 ( ) yr ’  ' 4 == 0 && ( ! (yr °i ' 100 == 0) ii yr 7, 400 == 0) ; ! (yr % 100 == 0) is equivalent with (yr % 100 != 0) The ! (logical negation): highest precedence if (! found) same as if (found == 0) (zero is false) if (found) same as if (found != 0) (nonzero is true) : lower precedence than arithmetic ones we can naturally write x >= Avoid side-effects in compound tests (or place them first) and are different notions! 2 * f (x) + g(x) : multiplication before addition (precedence) which part of sum is evaluated first (f or g) in recursive functions we don’t need to change variable values a programming style typical for (pure) Recursive calls create with in , we use: to represent objects used in solving the problem (current character; partial result; number left to process) , to give a to a variable (to express a computation step in the program) : variable expression Everything is an : 1 The expression is evaluated 2 the value is to the variable and becomes the value of the entire expression Example: c = getcharO n = n-1 r = r * n May appear in other expressions: if ((c = getcharO) != EOF) May be chained: a = b = x+3 a and b get the same value Any (function caii, assignment) with is a printf ("hello"); c = getcharO; x = x + 1; A variable changes value i NOT in other expressions, or by passing as parameter! n + 1 sqr(x) toupper(c) , DON’T change! n = n + 1 x = sqr(x) c = toupper(c) assignment comparison Recall: reversing a number rev( n , rev(46 , rev( 4 , rev( rev( O , r ) empty(O) ) ) 5 ) 56 ) We have done repeated processing through on each caii, for n 10, 10*r+n° "10 Controls repetition (caii) or termination (base case) n == 0 one for each repetitive computation Expresses the repetition of a statement, guarded by a condition: false while expression statement statement 11! Expression must be —r - between parantheses L : evaluate expression if it is true (nonzero): (1) execute statement (loop body) (2) go back to start of while (evaluate expression) Else (if condition is false zero), don’t execute anything => body executes repeatedly, as long as (while) condition is true We can define iteration (the while loop) recursively: while expression statement is the same as if expression statement while expression statement } Recursion is fundamental it can express any iteration ) { n > О ? fact r(n - 1, г * n) : г; ( n > О ? pow r(x, n-1, : r; x*r) ( = 1; (n > о) { г = г * n; n = n - 1; } r; } ( = 1; (n > o) { r = x * r; n = n - 1; r; } ) { ) { Easier if function is written by accumulating a partial result ( ) Stop test and initial result value are the same as in recursion Recursion creates of parameters for each recursive caii, with new values dependent on the old ones: ex n * r, n - 1, x * r, etc iteration values to variables in each iteration, using the same rules expressions Ex r = n*r, n = n - 1, r = x*r Both variants return the accumulated result Recursion and iteration both repeat a processing step => in a problem we use one or the other, rarely both ( { = 0; > (isdigit(c = r = 10*r + c -ungetc(c, stdin); } r; ( ( 0; ) { } = getcharO)) , readnatO); Reading all input, doing nothing (body of while is empty, is the Do not write after ( ) { ((c = getcharO) ! = 0; } Reading and printing all input: ((c = getcharO) ! = putchar(c); u ni ntentional ly! = EOF); = EOF) ((с = getcharO) != EOF) (с == ) puts( ); Often, we search for more text of some sort after that character looking for the first word: ( ) { ((c = getcharO) != EOF) (c == ) { (isspace(c = getcharO)); (isalpha(c = getcharO)) putchar(c); } } 0; Function that reads and prints up to a specified character returns that character or EOF if reached before that char ( ) ((c = getcharO) != EOF && c ! = stopchar) putchar(c); c; (c=getcharO) ! =E0F (assign, then compare) ( ) ((c = getcharO) != EOF && c ! = stopchar); c; after means an empty loop body char с = getcharO; YES: = getcharO; if is , c will never compare equal to EOF (-1) will never leave a while (c != EOF) loop if is , reading byte 255 becomes -1 (EOF) a valid char (code 255) will be taken as EOF (early stop) -(! EOF) EOF is a nonzero constant (-1) thus the condition is always false, the loop is never entered! YES: ((c = getcharO) != EOF) and careful with the О i -(c = getcharO—!= EOF) ! = has higher precedence, its result (o or 1) is assigned to c -= getcharO ; (c may loop forever! YES: -((c = getcharO) !=-&& c != EOF) will exit! if we search for text starting with a given char, continue checking for text in the ( ) that has found that char: e g ignore   if followed by letters, print rest ( ) { ((c = getcharO) != EOF) { (c == ) (isalpha(c = getcharO)) (isalpha(c = getcharO)); putchar( ); putchar(c); 0; This has a slight problem, do you notice ? if string of letters ends with EOF, will also try to print EOF -1 converted to code 255 (strange character, y) When for a given char, must also When a char (e g after a loop), must ( ) { ((c = getcharO) != EOF) { (c == ) { (isalpha(c = getcharO)) (isalpha(c = getcharO)); putchar( ); (c != EOF) putchar(c); } putchar(c); 0; Ex: ignore   followed by repeated text between braces  {text l}-[text2} ( ) { ((c = getcharO) != EOF) (c == ) { ((c = getcharO) == ) ((c = getcharO) != ) (c == EOF) 1; (c != EOF) putchar(c); } putchar(c); 0; Often, it is useful two write functions for parts of the pattern (makes code more manageable) do statement while expression Sometimes we know that a cycle needs to be executed at least once (we read at least one character, a number has at least one digit) Like the while loop, executes statement as long as the expression evaluates to true (nonzero) Expression is (re)evaluated every iteration statement Equivalent with: while ( expression ) statement We should consider: what variable changes in each iteration ? what is the loop continuation stopping condition ? Don’t forget update of variable that Controls loop (otherwise will loop forever) What do we know on exiting the loop ? The loop condition is we consider this as we reason further about the program We inspect check test the program: mentally, running it "pencil and paper" on simple cases then with increasingly complex tests, including corner cases Marius Minea 10 October 2016 A simple definition (first programming course): "Type = set of values together with some operations on that set" A trivial error (at the ML prompt): # (+) 3 (fun x -> x);; Error: This expression should not be a function, the expected type is int => Some (syntactically correct) programs do not make sense Genera i izi ng: "A is any property of a program that we can establish without executing the program" Krishnamurthi, PLAi book a mechanism for distinguishing good programs from bad (informally) "A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute" Pierce (+) 1 (if unknown then 3 else (fun x -> x)) would run OK if unknown is true would give an error otherwise => can’t (always) decide Type systems are always prey to the Halting Problem => a type system for a general-purpose language must always either over- or under-approximate: either must accept programs that will error when executed or must reject programs that might have run without error Krishnamurthi, PLAi [Cardelli and Wegner, On Understanding Types, Data Abstraction and Polymorphism] The following are universes bit-strings in computer memory everything represented as bit-strings => untyped only one type S-expressions in (pure) Lisp no distinction between program and data but: some structure (more than bit-strings) A-expressions in the A-calculus everything is a function (numbers, booleans, if-then-else) Sets in set theory everything is an element or a set (can encode mathematics ) Bitstrings can represent operations or characters, integers, Some S-expressions are lists, others are LiSP programs Some A— expressions (functions) represent booleans, or integers Some sets may denote ordered pairs, leading to functions => Can think of universes as But this is an illusion unless there is some means to enforce it Typing may be: (types part of syntax, e g all variables typed) (can be reconstructed: type inference) Types avoid problems related to exposing internai representation Types impose constraints which help to enforce correciness Types avoid logical inconsistencies ("set of all sets") Types prevent inconsistent interactions between objects "A type may be viewed as a set of clothes (or a suit of armor) that protects an underlying untyped representation from arbitrary or unintended use " Cardelli Wegner "Violating the type system involves removing the protective set of clothing and operating directly on the naked representation " A program might have: [Cardelli, Type Systems] : cause computation to stop : may go unnoticed A program (fragment) is if it does not cause untrapped errors A language: all program fragments are safe But, we want more no untrapped errors no trapped errors that we consider programmer must avoid other trapped errors type of every expression can be determined by static analysis at compile-time, e g, ML, Java, Pascal (partly unsafe) well-typed programs are well-behaved (conservatively) Languages in which all expressions are type-consistent although type itself may be statically unknown can be done by introducing some run-time type checking Static implies strong typing, but strong typing could be dynamic (weak checking) some unsafe operations detected Pascal: untagged variants and function parameters unsafe Modula-3: separates safe unsafe modules Strachey (1967) defines: - : function works uniformly on a range of types (with some common structure) - : function works on several different types (may not have common structure), may behave in unrelated ways Refined classification [Cardelli and Wegner]: Polymorphism universal ad-hoc parametric inclusion overloading coercion Overloading integer constants may have both type int and real, purely syntactic Coercion: an integer value can be used where a real is expected conversions inferred at compile time (or even runtime: LiSP) Subtyping: elements of subrange type also belong to supertype Value sharing: nil constant shared by all the pointer types (Pascal) example of parametric polymorphism Coercion: a single abstraction serves several types through implicit type conversion Overloading: a single identifier denotes several abstractions Parametric: an abstraction operates uniformly across different types inclusion: an abstraction operates through an inclusion relation [Wm Paul Rogers, Reveal the magic behind subtype polymorphism, JavaWorld, 2001] different functions with same name; context used to make decision could view as syntactic abbreviation handled by preprocessing e g multiple methods with same name, if signatures are distinct semantic operation, converts a type to that expected by a function (otherwise type error would occur) can be done statically or dynamically Distinction blurred at times Discuss: 3 + 4 3 0 + 4 3 + 4 0 3 0 + 4 0 Use of a single abstraction across different types e g iist abstraction 'a list subtyping and inheritance "when i see a bird that walks like a duck and swims like a duck and quacks like a duck, i caii that bird a duck " a form of dynamic typing concerned with just the aspects of an object that are used, rather than the type of the object itself (entire interface) Offers more freedom (polymorphism without inheritance) Does not define an explicit interface Can result in semantic unintended behavior 12 October 2016 Tests are generated based on of code Other (better) names: glass box, clear box, open box Another classification: testing (black-box)   (white-box) Comparison: - black-box: at any level   white-box: mostly module unit testing - white-box: code change => tests change - white-box: easier detection of but cannot detect (in code or spec) (CFG) graph representation of program and implicitly its execution paths nodes = instructions edges (labeled w conditions): sequencing between instructions x:=a + b; у :=a*b; while (y > a) { a := a + 1; x:=a + b Usually, straight-line code is grouped together => a sequence of statements with just one entry and one exit point (no jumps into middle of code, or from code outside) image: https:  vinaytech wordpress com 2008 10 04 abstract-syntax-tree  = a criterion to measure if a set of tests is What good are such criteria ? For questions as: What program properties should we examine ? What test data do we select for such properties ? What objectives do we set for testing ? Did we test enough ? Burnstein, Practicai Software Testing => i e , through the CFG But: number of program paths usually infinite (loops, recursion also: one path, multiple data (proper equivalence classes?) => must choose modest structural criteria => but not arbitrary - chosen Antiextensionality: There are equivalent programs P si Q such that a test suite T is adequate for P but not for Q i e equivalent programs may need different test suites General Multiple Change: There are programs P and Q that have the same form (structure) and a test suite T which is adequate for P but not for Q i e close programs may need different test suites also: statement coverage, basic block coverage Sufficient tests to execute each program statement Obviously a necessary criterion (not executed = not tested) obviously also insufficient a , *s = NULL; (len a case where branch coverage does not subsume line coverage A condition is an in a decision needs tests for each possible value of a condition apparently more complex than decision coverage, but does not subsume it Example (x > 5 && у == 3) Two tests: x = 6, у = 2 and x = 4, у = 3 generate all possible condition values ( T and F, F and T) but follow the same branch (false) Simultaneously covers criteria May need more tests than individual methods or just recombining them Example (x > 5 && у == 3) two tests are still enough: x = 6, у = 3, and x = 4, у = 2 May be insufficient: the effect of some conditions may others Tests all combinations for the (conditions) of the decision Exponential in number of conditions (2n tests for n conditions) => often too expensive to implement in pratice, some of the 2" combinations - may be irrelevant (for short-circuit evaluation) - may be infeasible (when conditions are not independent) => in general, this requirement is not justified One of the strongest criteria; initially developed at Boeing is a requirement in avionics safety-critical systems (standard DO-178B) Complete requirements for an MC DC test suite: Aii program entry and exit points covered Each decision exercised on both branches Each condition takes both values Each condition is shown to affect its enclosing decision (keep other conditions fixed, varying condition of interest) Same tests, whether language has short-circuit evaluation or not Start from base cases && and i i with two conditions AND operator && has a single case (t t) with result t Changing any condition to f, result becomes f Likewise for | | (dual operator), switching t and f aba&feb aballb t f (1) t f (2) t (3) f t f t f a: (1, 3) b: (2, 3) We indicate the pair of tests relevant for each condition: (1) (2) (3) (1, 3) shows a may influence decision; likewise, (2, 3) for b For n conditions: a test with all the same, n tests with one each flipped a b c a && b && c t t f (1) t t f (2) t t f (3) t (4) a: (1, 4) b: (2, 4) c: (3, 4) Consider а && b && (с | | d && e) Start from innermost expression(s), d && e (watch precedence!) d e d && e t f (1) t f (2) t (3) d: (1, 3) e: (2, 3) We then add с | | Since i i with f does not change truth, add c=f to all tests (1-3) For the new test (4), choose test with f result (2) and add c=t c d e с 1 1 d && e f t f t f f t t f t (1) Now also shows effect of c: (2) c: (2, 4) (3) d: (1, 3) (4) e: (2, 3) Now add a && b && То previous tests, add a=t, b=t Then choose a test with t result (4), flip in turm a and b to f, showing a and b influence decision: a b c d e a && b && (с i i d && e) F"t f t f (i) t t t f (2) t t f t (3) t f t (4) t t f f (5) t t f f (6) a: (4, 5) b: (4, 6) c: (2, 4) d: (1, 3) e: (2, 3) Each test pair has one condition shown to influence outcome, all other conditions have the same value in both tests By construction, it follows that n variables need n + 1 tests Consider а && b | | c && d We write tests for both subexpressions (given by precedence) a b a && b а: (Г, 3’) c d c && d t f (1’) b: (2’, 3’) t f (1") t f (2’) c: (1", 3") t f (2") t (3’) d: (2", 3") t (3") We combine with | | Since i i with f has no effect, choose one f test from each group (1’ + 1") and combine with all tests in the other group a b c d a && b i i c && d t t f (1=1’+1") a: (1- 5) f t t f (2=l’+2") b: (4, 5) f t t (3=l’+3") c: (1, 3) t f t f (4=2’+l") d: (2, 3) f t t (5=3’+l") We have thus kept the influence of each individual condition The above analysis is valid for independent conditions it’s always possible to generate the designed tests in reality, conditions may be (correlated) Example: (z - x >= 3 && z - у >= 1 | | у = 3 influence the condition, we’d need x = 5, and z - у >= 1 But from these, we get z - x >= 3, thus the condition can’t be false, and can’t influence the decision! => trying to get MC DC coverage, we can detect if a condition is written needlessly complex, or has irrelevant parts (a possible logic error) in this case, since z - x = 3 to true: (z - у >= 1 || у e g combinations of successive if statements in the program => we need a criterion closer to path coverage (which would cover all execution paths) Approach: identify n relevant predicates (conditions) in the program Try to generate all S   2n possible combinations S States (program locations), n predicates => correlates between them all States and predicates in the program ( П, ) { = а ; = 1, hi = n-1; (Іо pivot) hi—; (Іо pivot [ Beizer, Software Testing Techniques ] For simple cycles - zero iterations (cycle is skipped) possibly also: negative counter - correct behavior? - one iteration - two iterations (may catch - one typical intermediate value - N-l iterations - N iterations - try to force N+l iterations (more than assumed max) For nonzero minimum: try min-1, min, min+1 1 minimal number of outer iterations try inner cycle completely (as independent cycle) 2 continue following cycles outwards - with inner cycle at typical iteration count - vary count for current cycle 3 finally, vary all cycles together from min to max - all paths that traverse a cycle once, without repetition (boundary test) - all paths that repeat a test, at most once (interior test) an LCSAJ sequence: straight line code followed by a jump length N LCSAJ criterion: N such consecutive sequences N = 1 ensures line coverage N = 2 ensures branch coverage (even more) Try changing decisions statements according to some patterns to detect if the program runs differently Examples: - need сору of correct value Check bytes next to (before) ret address => terminator canary: 0, CR, LF, EOF random canary (don’t know can’t put back) random XOR canary (must also know control value) Who how when implements these checks? (b) improve the chance (a) Jump to the malicious code Option: hamper execution Attacker must execute injected code: Non-executable stack   write XOR execute Attacker must know to jump to: Address Space Layout Randomization What flexibility does the attacker code have? is attack still realistic? For 32-bit vs 64-bit ? if you can’t execute code on stack, try something else Typical attack is to caii exec or some other library function => instead of executing code (caii exec), put address (and parameters) of libc function on stack, in place of normal ret address Which protections are effective? Can chain attacks - put multiple library addresses on stack Generalize: return-oriented programming Function pointers (denote code) pointers from longjmp pointers to user functions pointers to library functions (PLT: procedure linkage table) or usual pointers to data Attacks might be in two steps: a buffer overflow overwrites a pointer (to desired address) in later code, this is used to overwrite criticai area ret address, PLT, etc Marius Minea marius@cs upt ro 12 October 2017 Write correct code minimizing risks with proper error handling avoiding security pitfalls portable some C-specific, some general Math is perfect, computer have limits sometimes easier to reach than one might think Numeric types differ in C and mathematics in math: ZcR, both are , R is dense uncountable in C: , , are both have , reals have to remember this! (overflows, precision loss) (Almost) all operations can give results that don’t fit For : called Check before: (y > UiNT MAX - x) or after: sum = x + y; (sum 4333222111u) printf("-5 > 4333222111 !!! n"); because -5 converted to unsigned has higher value Correct comparison between int i and unsigned u: if (i = 0 && i >= u) (compares i and u only if i is nonnegative) right-shift a negative int! -= ; -( ; n; n "= 1 )— May loop forever if n negative; the topmost bit inserted is usually the sign bit (implementation-defined) Use unsigned (inserts a 0) shift with more than bit width (behavior undefined) (in some implementations, shifts with count modulo bitwidth) int (even long) may have small range (32 bits: ± 2 billion) Not enough for computations with large integers (factorial, etc ) Use double (bigger range) or arbitrary precision libraries (bignum) Floating point has limited precision: beyond 1E16, double does not distinguish two consecutive integers! A decimal value may not be precisely represented in base 2: may be periodic fraction: l-2(io) — l(0011)(2) printf32 if); writes 32 099998 Due to precision loss in computation, result may be inexact replace x==y test with fabs(x - y) for x can’t and should not use feof ( ) as test in the read loop Checking for end-of-input explicitly is rarely needed The point of processing is to => thus we must check that data was read successfully: while ( ) On exit from loop, if feof (stdin), input is finished else input does not match format => read next char(s) and report write code of the form -(!feof(stdin)) scanf ( -, &n); After last good read (number), end-of-input is not yet reached unless no more separators (whitespace, inel, newline) after it next read will not succeed, but is not checked if read is checked (as it be), testing EOF is not needed: (scanf( , &n) == 1) Often, we have to fiii an array up to some stopping condition: read from input upto a given character (period,  n, etc) сору from another string or array Arrays must not be written beyond their length! ( = 0; i must report, perhaps chance to recover Even printing to stdout could fail (redirected, no more space) global variable declared in errno h contains code of last error in a library function (illegal operation, file not found, not enough memory, etc ) Careful: use errno only if sure that there was an error for safety reset before calling function that may fail Function ( *s) from stdio h prints user message s, a colon : and then the error description (same as given by *strerror( ) from string h) = atoi(s); returns 0 on error, but also for Avoid Use only when string known to be good ; s[] = ; (sscanf(s, , &n) == 1) but we don’t know where processing of string stopped also does not signal overflow (if number too large) ( *nptr, **endptr, ); assigns to *endptr the address of first unprocessed char *end; n = strtol(s, &end, 10); base 10 or other also for , for base 10 set errno to ERANGE on overflow Cil standard => it is function : did not limit size read to use safely char *fgets(char *s, int size, FiLE *stream); Reads up to and including newline  n, max size-1 characters, Stores line in array s, adds ’ 0’ at the end —: scanf ( -Leads to in format! str ; (scanf( , str) != 1) { } } C is No (safe) notion of memory object Can create (almost) arbitrary pointers Can’t really pass an array to a function pointer passed instead address carries no length information Must pass array length as separate parameter but programmer responsible for passing correct value even for heap-allocated chunk, length is available to library, but not checked on use strcpy strncpy - but was really designed for fixed-length strings strcat seldom a logical reason to use strncat careful: can write n+1 bytes sprintf snprintf Marius Minea marius@cs upt ro 12 October 2017 Write correct code minimizing risks with proper error handling avoiding security pitfalls portable some C-specific, some general Math is perfect, computer have limits sometimes easier to reach than one might think Numeric types differ in C and mathematics in math: ZcR, both are , R is dense uncountable in C: , , are both have , reals have to remember this! (overflows, precision loss) (Almost) all operations can give results that don’t fit For : called Check before: (y > UiNT MAX - x) or after: sum = x + y; (sum 4333222111u) printf("-5 > 4333222111 !!! n"); because -5 converted to unsigned has higher value Correct comparison between int i and unsigned u: if (i = 0 && i >= u) (compares i and u only if i is nonnegative) right-shift a negative int! -= ; -( ; n; n "= 1 )— May loop forever if n negative; the topmost bit inserted is usually the sign bit (implementation-defined) Use unsigned (inserts a 0) shift with more than bit width (behavior undefined) (in some implementations, shifts with count modulo bitwidth) int (even long) may have small range (32 bits: ± 2 billion) Not enough for computations with large integers (factorial, etc ) Use double (bigger range) or arbitrary precision libraries (bignum) Floating point has limited precision: beyond 1E16, double does not distinguish two consecutive integers! A decimal value may not be precisely represented in base 2: may be periodic fraction: l-2(io) — l(0011)(2) printf32 if); writes 32 099998 Due to precision loss in computation, result may be inexact replace x==y test with fabs(x - y) for x can’t and should not use feof ( ) as test in the read loop Checking for end-of-input explicitly is rarely needed The point of processing is to => thus we must check that data was read successfully: while ( ) On exit from loop, if feof (stdin), input is finished else input does not match format => read next char(s) and report write code of the form -(!feof(stdin)) scanf ( -, &n); After last good read (number), end-of-input is not yet reached unless no more separators (whitespace, inel, newline) after it next read will not succeed, but is not checked if read is checked (as it be), testing EOF is not needed: (scanf( , &n) == 1) Often, we have to fiii an array up to some stopping condition: read from input upto a given character (period,  n, etc) сору from another string or array Arrays must not be written beyond their length! ( = 0; i must report, perhaps chance to recover Even printing to stdout could fail (redirected, no more space) global variable declared in errno h contains code of last error in a library function (illegal operation, file not found, not enough memory, etc ) Careful: use errno only if sure that there was an error for safety reset before calling function that may fail Function ( *s) from stdio h prints user message s, a colon : and then the error description (same as given by *strerror( ) from string h) = atoi(s); returns 0 on error, but also for Avoid Use only when string known to be good ; s[] = ; (sscanf(s, , &n) == 1) but we don’t know where processing of string stopped also does not signal overflow (if number too large) ( *nptr, **endptr, ); assigns to *endptr the address of first unprocessed char *end; n = strtol(s, &end, 10); base 10 or other also for , for base 10 set errno to ERANGE on overflow Cil standard => it is function : did not limit size read to use safely char *fgets(char *s, int size, FiLE *stream); Reads up to and including newline  n, max size-1 characters, Stores line in array s, adds ’ 0’ at the end —: scanf ( -Leads to in format! str ; (scanf( , str) != 1) { } } C is No (safe) notion of memory object Can create (almost) arbitrary pointers Can’t really pass an array to a function pointer passed instead address carries no length information Must pass array length as separate parameter but programmer responsible for passing correct value even for heap-allocated chunk, length is available to library, but not checked on use strcpy strncpy - but was really designed for fixed-length strings strcat seldom a logical reason to use strncat careful: can write n+1 bytes sprintf snprintf Marius Minea marius@cs upt ro 25 October 2017 The international Obfuscated C Code Contest http:  www ioccc org  Best one-liner 2015: Visual factorization f(y,x){ ,z; (m=z=l ;m*m exponential if number of components; may be impossible to build Specifications given as automata can verification algorithms: => only the needed parts of state space are constructed 27 October 2005 Approach: build automaton S from negation of specification From product state s = (r,q) with r e A (system) and q e S (spec): - consider only those successors of r labeled the same as transitions from q - if counterexample found, terminate without exploring entire state space Basic idea: build model - state space and execution paths are subsets of full (original) model - preserves the same properties as original model Approach is sound if exluded states paths bring no extra information - must determine an between paths - such that specification cannot distinguish between equivalent paths - reduced model should contain a re prese ntative from each equivalence class Method named initially after partial ordering of executed transitions More generic term: model checking using re prese ntatives Formal verification Lecture 4 Marius Minea Formal verification Lecture 4 Marius Minea Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata Partial order reduction Model checking with automata Partial order reduction Model checking with automata Asynchronous composition => arbitrary ordering of concurrent events n transitions generate ni orderings and 2n States combinatorial (exponential) "explosion" of resulting state space Model: state-transition system (S,T, Sq,L) A transition а e T is a subset aCSxS (viewed as a family of transitions with the same labei) Transition is in s: a e enabled(s) #3s'eS a(s,s') We consider only transitions: Va,s 3!s' a(s,s') - the system may still be nondeterministic if |enaWed(s)| > 1 : two conditions, Vs e S: Enabling' a,(3 Q enabled(s) => a Q enabled(J3(s)) Л  3 E enabled(a(s)) - two independent transitions do not disable each other - but one may lead to the other being enabled Сот mutat ivity: enabled(s) => a(J3(s)) = 0(a(s)) - effect of execution same, regardless of ordering (with respect to AP' C AP) а eT invisible Vs,s' ES,s' = a(s) => L(s) П AP' = L(s') П AP' (does not change labeling with propositions from AP') typically: AP' = atomic propositions from specification Formal verification Lecture 4 Marius Minea Formal verification Lecture 4 Marius Minea Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata Partial order reduction Model checking with automata Partial order reduction Model checking with automata in asynchronous composition, the next-time operator X is not relevant: -two transitions in different components can occur in any order - two transitions in the same component can be separated by arbitrar-ily many transitions in other components => the local state stays the same Two infinite paths % = "osi - and are stuttering equiv- alent %  st тг7 if they can be split into pairwise corresponding finite blocks of identically labelled States 3 infinite sequences 0 = i0 0 ^(sh:) = ^(sifc+i) = • • • ^(sh +1-i) =  (rjk) = ^(rjk+i) =     L(rjk+1 i) An LTL formula Af is if Ѵтг,7г7 with %  st % |=  0^1=  Theorem: Any LTL X formula (without the Xoperator) is a stuttering-invariant property, and conversely Formal verification Lecture 4 Marius Minea The reduced model is constructed selecting from each state only a subset of the transitions enabled in that state Selection is made keeping for every path from the original model M a stuttering-equivalent path in the reduced model M' => ѴА  e LTL x M |= Af M' |= Af Various names and selection criteria: stubborn sets [Valmari], persistent sets [Godefroid]; utilizam ample sets [Peled] Selection of transitions: expressed by a set of conditions: СО: amp e(s) = 0 enabled(s') = 0 successor in original model => there exists successor in reduced model Formal verification Lecture 4 Marius Minea Cl A path from s cannot execute a transition dependent on a transition from amp e(s) before executing a transition from amplets') Property: Transitions from ample(s) are independent of those in enabled(s')   ample(s) => any transition from a state s has one of the forms: - a prefix аіа2 -   -an 3, where  Зе amplets'), and ai independent of  3 - an infinite sequence crocq with independent of any  3 e ample(s) C2 (ІПvisibility) ampie(s) 7^ enabled(s') > ampie(s) C invisible(s') if s is not explored completely all transitions from ample(s) are invisible Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata Partial order reduction Model checking with automata Partial order reduction Model checking with automata C3 A transition activated in all States in a cycle must be included in amplets') for at least one state s of the cycle - guarantees that no port ion of the state space is unexplored because of persistent! ignoring of a transition - implementation: in any cycle, a state is explored completely For the path % from s, we construct an equivalent path тг7 in the reduced model: a) if the next transition is in ample(s), we add it to тг7 b) if the next transition ІП тг iS not ІП ample(s) => Cf C2 transitions from ampZe(s) ЭГѲ invisible (3 transitions ampZe(s)) bl) if in тг there is some transition  3 e amplets'), we add it to тг7 - cf Cl,  3 independent of previous transitions - it's invisible, thus commuting it doesn't affect spec b2) there are no transitions from amplets') in % => add arbitrary transition  3 e amplets') to тг7 - cf Cl it does not enable successive transitions - it's invisible => does not affect spec - cf C3 this case appears a finite number of times Conditions cannot be verified directly => conservative heuristics - Transitions reading and writing a shared variable are dependent - Conditional choices in the same process are dependent - Communication transitions enter dependencies in both processes - Send operations on the same buffer are dependent Likewise, for receives from the same buffer Transitions with disjoit process sets are independent => select a set P of processes which in the current state do not have communication operations with processes outside P => amplets') = active transitions from P ideally: few transitions in ample(s) (e g local transitions in a process) Formal verification Lecture 4 Marius Minea Formal verification Lecture 4 Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata Partial order reduction Model checking with automata Partial order reduction Model checking with automata We've discussed so far: implementation (model): finite-state automaton specification: formula in temporal logic (LTL, CTL) Another view: - specification is also an automaton - with "fewer details" than the implementation - model checking for LTL: by converting formula to automaton General idea: - we check formulas Af (f = path formula in which the only state subformulas are atomic propositions) - Af = ->E-if => enough to consider Ef - we construct a tableau T for the formula f = an automaton (Kripke structure) that expresses all paths that satisfy f - we compose the model M with the tableau T - we check if there exists a path in the composition (with CTL model checking algorithms) Let APf be the set of atomic propositions that appear in f T = (Sp, Rp, Lp'), cu Lp : Sp —> 2^^ Tableau States: sets of elementary formulas extracted from f • eZ(p) = {p} for p e APf • = el(g) • eZ(ffi V g2) = el(gi) U el(g2) • el(Xg) = {Xg} U el(g) • eK91u92) = {X g П s' — Xg 8 Vsz R(s, sz) —> g sr Rt(s,s) = sE sat(Xg) s € sat(g) XpeeZ(f) Formal verification Lectore 4 Marius Minea e ( ) = {", r,Xf } sai(r) U (Vsai(a) П sai(X )) Formal verification Lectore 4 Rp = U = sat(X f) " sat(f) Marius Minea Definim TxM = (Sp,Rp,Lp) x (SM, RM, LM) = (S,R,L) = P • S = {(sT,sm) i sp e Sp,sM e Sj^Lptsp') = ЛмЙ) n ^f} • К(8п8м)>(3Т’8мУ) = Rt4sta'p) p rm(sm>s'm) (simultaneous transitions, only for identically labeled States) Product: restricted to States from which there is at least one transition Problem: T does not guarantee (eventuality) properties: Rp ensures sat(gUh') continually sat( i), but not also Fsat(Ji) => model checking with falrness' {sat(gUh') h | gUh apare in f} Theorem: M,s^  = Ef e sat(f) Р,(вр,8м) |=F EGTrue with fairness conditions {sat(gUh') h | gUh apare in f} Formal verification Lectore 4 Marius Minea Marius Minea marius@cs upt ro 17 October 2017 We’ve used the simple assignment: ivalue = expression = what can be on the of an assignment so far: variable; see later: array element; pointer dereference : += = *=  = " = x += expr is a shorthand for x = x + expr etc later: also for bitwise assignment operators " " &   i use them: shorter and makes intent of transformation clearer prefix  postfix: ++ — ++i increments i, expression value is value assignment i++ increments i, expression value is value assignment both have same (assignment) but different int x=2, y, z; у = x++;  * y=2,x=3 * ; z = ++x;    x=4,z=4 => same effect as statements, not same value in expressions in a complex expression, when do side effects actually take place? Most operators have of operands (e g , arithmetic) => only of computations is imposed But: All side effects must complete before Crossing a Examples of sequence points (standard, Annex C) - for function calis, between evaluating the function designator (function expression) + arguments, and the actual caii - for between evaluating first and second operand - in between evaluating the first operand and the second third if a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined if there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings C standard, 6 5 Expressions Thus, i = i++ or a[i] = i++ are Even when order of side effects is well defined, use with caution! write: return i++; assignment to i is useless, since the function returns obscures intent: should it be return i; or return i+1; ? : c = toupper(c);—return c; DO: return toupper(c); read multiple characters in an expression: if (getcharO == ’*’ && ((c = getcharO) == ’ ’) if first comparison fails, second char is not read (c has previous   uninitialized value) => hard to reason about program behavior init-clause test-expr update-expr statement init-clause; {test-expr) { statement is equivalent* with: * except: statement, see later update-expr; } Any of the 3 parts in ( ) may be missing, but semicolons stay if test-expr is absent, it is considered true (infinite loop) Before C99: init part could only be an expression, e g i = 0 Since C99: init-clause can also be a , e g = 0 scope of declared identifiers is loop body only => loop scope for counters, if they are not needed later (scope of identifiers should only be as much as needed) The semicolon is the DO NOT use after closing of unless you want empty body! ( ) { = 5; (n-) printf( n = 5; ( printf( ( printf( ( printf( ( printf( = 0; i = 1; i = n; i = n; i 0; } , n); 0; —i) , i) , i) if direction does not matter, this is shortest: ( = n; i—;) also easier to compare to zero Warning: test expression is computed time => , e g for (int i = 0;—i ( ) = 0; ( ;; ++nrw) { (isspace(c = getcharO)); (c == EOF) ; (!isspace(c = getcharO) && c != EOF); } printf( , nrw); 0; } = sequence of non-whitespace chars (common term usage)  t  n  v  f  r and space, as checked by isspaceO ( ) { ( ; (c = getcharO) != EOF; ) (isspace(c)) putchar(c); putchar(toupper(c)); ((c = getcharO) != EOF) { putchar(c); (isspace(c)) ; } } 0; } jumps to the i e , to in in a and to in or or loop ( ) { ( = 2; d*d 1) printf( , exp); (n > 1) putchar( ); ; printf( , n); } Use sparingly (much less common than ) can make code clearer, if decision to skip is early, and loop is long otherwise, a simple may be easier to read and understand Syntax: statementlabel ; Jumps to statement with given labei, only inside same function Any statement can be prefixed with a followed by Discouraged (unstructured code); ok to jump out of several loops ( ) = 0, nw = 0, nl = 0; ( ; (c = getcharO) != EOF; ++nc) { (!isspace(c)) (++nc, ++nw; !isspace(c = getcharO); ++nc) (c == EOF) outloop; (c == ) ++nl; outloop: printf( , nl, nw, nc); 0; Used for multiple branches depending on an can be clearer more efficient than a multiple ( ) = 3, b = 4, c, r; (c = getcharO) { : r = a + b; ; : r = a - b; ; : c = ; : r = a * b; ; : r = a   b; ; : fputs( , stderr); 1; printf( , a, c, b, r); 0; Syntax: integer-expression statement statement is a block with multiple statements, some value: statement The integer expression is evaluated if the statement has a labei with that value, jump to it Otherwise, if there is a , labei, jump to it Else, do nothing (goes on to next statement after ) A statement may have labels (flow jumps to same code) vall: val2: statement Normal statement sequencing applies: control flow does at the next labei (it’s just a labei) => : to exit statement, use ; A multiple statement will do tests (until one succeeds) A statement may be implemented using a the expression is evaluated and used as index in a table of addresses can be more efficient if range of possible values is limited (also: compiler may limit range of values to 1023, cf standard) More importantly: a may be But: not to forget where needed! about: what variables in each iteration ? what is the loop continuation stopping ? Don’t forget of variable that Controls loop! (otherwise will loop forever) On , the loop condition is use this to reason about what happens next the program: mentally, running it "pencil and paper" on simple cases then with more complex tests, including corner cases Expression : rigorously defined by a frequent notation: Backus-Naur form (BNF) Writing code: one function for each defined notion (nonterminal) (no parantheses precedence needed) expr ::= number | operator expr expr expr ::= number | expr expr operator Left recursive, can’t decide branch (start is always number) rewrite grammar: expr ::= number restexpr restexpr ::= e | expr operator restexpr e is usual notation for empty string Simplest attempt: ambiguous, no associativity or precedence expr ::= number | expr operator expr | ( expr ) => separate additive multiplicative expressions operators expr ::= term | expr + term | expr - term term ::= factor | term * factor | term   factor factor ::= number | ( expr ) expr and term still left-recursive rewrite: expr ::= term restexpr restexpr ::= e | + term restexpr | - term restexpr term ::= factor restterm restterm ::= e | * factor restterm |   factor restterm factor ::= number | ( expr ) One for each Function structure determined by computation ( ) expr ::= term restexpr restexpr needs previous term gets it as parameter ( ) { restexpr (termO ) ; } restexpr ::= e | + term restexpr | - term restexpr restexpr is right-recursive write as function ( ) { = getcharO ; (c == ) restexpr(tl + termO); or rewrite as loop within exprO, expression value ( ) { , e = termO; (;;) { ((c = getcharO) == ) e += term; 18 October 2017    assume(n>2); void partitiondnt a[], int n) { int pivot = a ; int lo = 1, hi = n-1; while (lo pivot) hi—; if (lo E), invariant implies postcondition Q {l  E}S{l} lA^E^Q {1} while E do S {Q} Find n knowing it’s initially between lo and hi: while (lo m)    both cases maintain lo m => n >= m+1 => n >= lo else hi = m;    !(n > m) => n n lo==n && n==hi assert(n == lo && n == hi); Consider {P} * x = 2 {v + *x = 4} What is the precondition P ? Right answer: v = 2 V x = But applying assignment rule (v + *x = 4)[*x 2] loses the second case We must model memory m = memory, a = address, d = data Consider the functions rd(m, a) return d and wr(m, a, d) return m1 Rule: rd(wr(m, ai, d), аг) =   л 32 31 ( rd(m, a?) if аг ф ai We must derive a property of memory m from the relation: rd(wr(m,x, 2), &v) + rd(wr(m,x, 2),x) = 4 rd(wr(m,x, 2), &v) + 2 = 4 rd(wr(m, x, 2), & v) = 2 x = & v Л 2 = 2 V x & v Л rd(m, &v) = 2 x = &гѵ V v = 2 E W Dijkstra Guarded Commands, Nondeterminacy and Formal Derivation of Programs (1975) - for a statement S and given postcondition Q there can be severai preconditions P such that {P} S {Q} or [P] S [(?] - Dijkstra establishes a precondition wp(S, Q) for successful termination of S with postcondition Q - necessary (weakest): if [P] S [Q] then P => wp(S, Q) - wp is a predicate transformer (transforms post- into precondition) precondiie) - allows defining a calculus with such transformations Assignment: wp(x := E, Q) = Q[x E] (see Hoare’s rule) Sequencing: wp(Si; S2, Q) = wp(Si, wp(S?, Q)) Decision: wp(if E then Si else S2, Q) = (E => wp(Si, Q)) A (->E => wp(S?, Q)) For loops, we need a recurrent computation Define wpk, assuming loop finishes in at most к iteration: wpo(while E do S, Q) = - Q (loop not entered) wpk+i(while E doS, Q)) = (E=> wp(S, wp ^while EdoS, Q))) A(-i  => Q) ( wp(Prog Post) check   Л E => wp(LoopBody,  ) for loops 27 October 2005 Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 2 System state space = cartesian product for components: S = S± x x Sn => exponential if number of components; may be impossible to build Specifications given as automata can verification algorithms: => only the needed parts of state space are constructed Approach: build automaton S from negation of specification From product state s = (r, q) with r e A (system) and q e S (spec): - consider only those successors of r labeled the same as transitions from q - if counterexample found, terminate without exploring entire state space Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 3 Basic idea: build model - state space and execution paths are subsets of full (original) model - preserves the same properties as original model Approach is sound if exluded states paths bring no extra information - must determine an between paths - such that specification cannot distinguish between equivalent paths - reduced model should contain a representative from each equivalence class Method named initiaily after partial ordering of executed transitions More generic term: model checking using representatives Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 4 ( 90,^0,so) Asynchronous composition => arbitrary ordering of concurrent events => n transitions generate n  orderings and 2n States => combinatoria! (exponential) "explosion" of resulting state space Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 5 Model: state-transition system Sq,L) A transition a e T is a subset a C S x S (viewed as a family of transitions with the same labei) Transition is ins: a G enabled(s') o G S a(s,s,y) We consider only transitions: a(s,sf>) - the system may still be nondeterministic if  enabled(s)  > 1 : two conditions, Vs e S: Enabling: a,0E enabled^s) => a G enabled(J3(sY) A  3 G enabled(a(sy) - two independent transitions do not disable each other - but one may lead to the other being enabled Commutativity o( 3(s)) =  3(o(s)) - effect of execution same, regardless of ordering Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 6 (with respect to APr C AP) aET invisible o Vs, sf ES,s'= a(s) => L(s) П AP' = L(s') П AP' (does not change labeling with propositions from AP') typically: AP' = atomic propositions from specification Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 7 in asynchronous composition, the next-time operator X is not relevant: - two transitions in different components can occur in any order - two transitions in the same component can be separated by arbitrar-ily many transitions in other components => the local state stays the same Two infinite paths тг = • • • and тг' = rgri are stuttering equiv- alent тг  st тг' if they can be split into pairwise corresponding finite blocks of identically labelled States 3 infinite sequences 0 = zq 0 L(sifc) = L(sifc+1) = • L(sifc+1 i) = = b(rJfc+i) = L(rJfc+1 i) An LTL formula Af is if   тг,тг' with тг  st тг', тг |= Theorem: Any LTL V formula (without the Xoperator) is a stuttering-invariant property, and conversely Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 8 The reduced model is constructed selecting from each state only a subset of the transitions enabled in that state Selection is made keeping for every path from the original model M a stuttering-equivalent path in the reduced model Mf => VAf e LTL X M |= Af O Mf |= Af Various names and selection criteria: stubborn sets [Valmari], persistent sets [Godefroid]; utilizam ample sets [Peled] Selection of transitions: expressed by a set of conditions: СО: amplele) = 0 O enabled(s') = 0 successor in original model => there exists successor in reduced model Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 9 Ci A path from s cannot execute a transition dependent on a transition from amplele) before executing a transition from amplele) Property: Transitions from amplele) are independent of those in enabled^s)   ample(s') => any transition from a state s has one of the forms: - a prefix аі"2 • • • an 3, where  3 e ampZe(s), and independent of  3 - an infinite sequence "оа1 • • with independent of any (3 e ample(s) C2 (invisibiiity) ample(s') ф enabled(s') => ample(s') C invisible(s') if s is not explored completely all transitions from ample(s') are invisible Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 10 C3 A transition activated in all States in a cycle must be included in ampZe(s) for at least one state s of the cycle Л  3 - guarantees that no portion of the state space is unexplored because of persistenti ignoring of a transition - implementation: in any cycle, a state is explored completely Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 11 For the path тг from s, we construct an equivalent path тг' in the reduced model: a) if the next transition is in arnple(s), we add it to тг' b) if the next transition in тг is not in ample(s) => cf C2 transitions from ampZe(s)are invisible (3 transitions   aniple(s)) bl) if in тг there is some transition (3 e amplele), we add it to тг' - cf Cl, (3 independent of previous transitions - it’s invisible, thus commuting it doesn’t affect spec b2) there are no transitions from amplele) in тг => add arbitrary transition (3 e amplele) to тг' - cf Cl it does not enable successive transitions - it’s invisible => does not affect spec - cf C3 this case appears a finite number of times Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 12 Conditions cannot be verified directly => conservative heuristics - Transitions reading and writing a shared variable are dependent - Conditional choices in the same process are dependent - Communication transitions enter dependencies in both processes - Send operations on the same buffer are dependent Likewise, for receives from the same buffer Transitions with disjoit process sets are independent => select a set P of processes which in the current state do not have communication operations with processes outside P => ample(jP) = active transitions from P ideally: few transitions in ample(jP) (e g local transitions in a process) Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 13 We've discussed so far: implementation (model): finite-state automaton specification: formula in temporal logic (LTL, CTL) Another view: - specification is also an automaton - with "fewer details" than the implementation - model checking for LTL: by converting formula to automaton Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 14 General idea: - we check formulas Af (f = path formula in which the only state subformulas are atomic propositions) - Af = —-E—-f => enough to consider Ef - we construct a tableau T for the formula f = an automaton (Kripke structure) that expresses all paths that satisfy f - we compose the model M with the tableau T - we check if there exists a path in the composition (with CTL model checking algorithms) Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 15 Let APf be the set of atomic propositions that appear in f T = ( Sy, Rt-)^t) ’ cu Lt : Sp —> 2 Tableau States: sets of elementary formulas extracted from f • eZ(p) = {p} for p E APf • eZ(—= eZ(g) • е (зі V 52) = eZ(fifi) U ei(g2) • eZ(Xg) = {Xg} U eZ(5) • ei(giU52) = {XGlUfte)} u eZ(5i) U eZ(g2) Set of tableau States: St = Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 16 We associate to every subformula of f a set of States from T (intuitively: set of States that satisfy the formula) • sat(g) = {s   g e s} for g e el(f) • sat(-i^) = {s | s 0 sat{g)} • sat(gi V s2) = sat{gi) U sat^gz) • sat(#iU#2) = sat(g2) U (sat(^i) П ^(X^U^))) TTransition relation: must be consistent with semantics of X — Xg G s  7sf R(s, sz) g G sf — Kg   s  7sf R(s, sz) g sr R i ^s s) = Д s G sat(Xg) 4=> sf G sat^g) Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 17 U (-isat(a) П sat(X )) U sat(X ) x sat^f) x —isattf') Formal verification Lecture 4 Marius Minea Partial order reduction Model checking with automata 18 Definim T x M = (Sy, Rt, Lt) x Rm, Lxj) = (S, R,R) = P • S = {(sy, sjvr) | st € St, Ці e Sm, Lt(st) = Lm(sm) n АРД • -R((sy, sM), (s'T, s'M)') = Rt(st, s't) л Rm(sM model checking with fairness: {sat(gUh') —> h | gUh apare in  } Theorem: P-xg |— Е  4Ф- ELs^’ e sat(f) P, (st,sm^ i—f E GTru6 with fairness conditions {sat^gUh) h | gUh apare in  } Formal verification Lecture 4 Marius Minea Marius Minea marius@cs upt ro 17 October 2016 We’ve used the simple assignment: ivalue = expression ivalue' variable; also: array element; pointer dereference : += = *=  = " = x += expr is a shorthand for x = x + expr see later: also bitwise assignment operators " same effect as statements, not same value in expressions C standard defines sequence points, they define a between evaluations (order specified for some but not all pairs) All side effects must complete before Crossing a sequence point Examples of sequence points are (Annex C) - between evaluating the function designator (function expression) and arguments, and the actual caii - between evaluating first and second arguments for - between evaluating the first operand in and the second third if a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined if there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings C standard, 6 5 Expressions Thus, i = i++ or a[i] = i++ are Even when order of side effects is well defined, use with caution! write: return i++; assignment to i is useless, since the function returns obscures intent: should it be return i; or return i+1; ? : c = toupper(c);—return c; DO: return toupper(c); read multiple characters in an expression: if (getcharO == ’*’ && ((c = getcharO) == ’ ’) if first comparison fails, second char is not read (c has previous   uninitialized value) => hard to reason about program behavior Exits the immediately enclosing switch or loop statement Used if we don’t want to continue the remaining processing Usually: if ( condition ) break; ( ) { = 0; (1) { J (isspace(c = getcharO)); (c == EOF) ; nrw = nrw + 1; (!isspace(c = getcharO) && c != EOF); } printf( , nrw); 0; } for init-clause test-expr update-expr statement init-clause; while {test-expr) { statement is equivalent* with: * except: continue statement, see later update-expr; } Any of the 3 parts in ( ) may be missing, but semicolons stay if test-expr is absent, it is considered true (infinite loop) Before C99: init part could only be an expression, e g i = 0 Since C99: init-clause can also be a , e g = 0 scope of declared identifiers is loop body only => loop scope for counters, if they are not needed later (scope of identifiers should only be as much as needed) The semicolon is the DO NOT use after closing of for unless for empty body! ( ) { = 5; (n-) printf( n = 5; ( printf( ( printf( ( printf( ( printf( = 0; i = 1; i = n; i = n; i 0; } , n); 0; —i) , i) , i) if direction does not matter, this is shortest: ( = n; i—;) also easier to compare to zero Warning: test expression is computed time => , e g for (int i = 0;—i ( ) { ((c = getcharO) != EOF) { (!isspace(c)) { putchar(toupper(c)); ((c = getcharO) != EOF) { putchar(c); (isspace(c)) ; } } putchar(c); } 0; } jumps to the in a while, do or for loop i e to in for and to in do or while ( ) { ( = 2; d*d 1) printf( , exp); (n > 1) putchar( ); ; printf( , n); Use sparingly (appears much less often than ) can make code clearer, if decision to skip is early, and loop is long otherwise, a simple if may be cleaner and clearer Syntax: somelabelname ; Jumps to statement with given labei, only inside same function Any statement can be labeled with somelabelname Discouraged (unstructured code); ok to jump out of several loops ( ) = 0, nw = 0, nl = 0; ( ; (c = getcharO) != EOF; ++nc) { (!isspace(c)) (++nc, ++nw; !isspace(c = getcharO); ++nc) (c == EOF) outloop; (c == ) ++nl; outloop: printf( , nl, nw, nc); 0; Used for multiple branches depending on an can be clearer more efficient than a multiple if else ( ) = 3, b = 4, c, r; (c = getcharO) { : r = a + b; ; : r = a - b; ; : c = ; : r = a * b; ; : r = a   b; ; : fputs( , stderr); 1; printf( , a, c, b, r); 0; Syntax: integer-expression statement statement is a block with multiple statements, some value: statement The integer expression is evaluated if the statement has a labei with that value, jump to it Otherwise, if there is a , labei, jump to it Else, do nothing (goes on to next statement after switch) A statement may have labels (flow jumps to same code) case vall: case val2: statement Normal statement sequencing applies: flow does next case labei (it’s just a labei) to exit switch statement, use statement ( at the A multiple else statement will do tests (until one succeeds) A statement may be implemented using a the expression is evaluated and used as index in a table of addresses can be more efficient if range of possible values is limited (also: compiler may limit range of values to 1023, cf standard) More importantly: a may be But: not to forget where needed! We should consider: what variable changes in each iteration ? what is the loop continuation stopping condition ? Don’t forget update of variable that Controls loop (otherwise will loop forever) What do we know on exiting the loop ? The loop condition is we consider this as we reason further about the program We inspect check test the program: mentally, running it "pencil and paper" on simple cases then with increasingly complex tests, including corner cases Expression : rigorously defined by a frequent notation: Backus-Naur form (BNF) Writing code: one function for each defined notion (nonterminal) (no parantheses precedence needed) expr ::= number | operator expr expr expr ::= number | expr expr operator Left recursive =^- both variants start alike, can’t decide choice rewrite grammar: expr ::= number restexpr restexpr ::= e | expr operator restexpr e is usual notation for empty string Simplest attempt: does not deal with precedence or parantheses: expr ::= number | expr operator expr | ( expr ) => distinguish additive multiplicative expressions operators expr ::= term | expr + term | expr - term term ::= factor | term * factor | term   factor factor ::= number | ( expr ) which variant to choose? expr or term? => rewrite: expr ::= term restexpr restexpr ::= e | + term restexpr | - term restexpr term ::= factor restterm restterm ::= e | * factor restterm |   factor restterm factor ::= number | ( expr ) One for each Function structure determined by computation ( ) expr ::= term restexpr restexpr needs previous term gets it as parameter ( ) { restexpr (termO ) ; } restexpr ::= e | + term restexpr | - term restexpr restexpr is right-recursive write as function ( ) { = getcharO ; (c == ) restexpr(tl + termO); or rewrite as loop within exprO, expression value ( ) { , e = termO; (;;) { ((c = getcharO) == ) e += term; 19 October 2016    assume(n>2); void partitiondnt a[], int n) { int pivot = a ; int lo = 1, hi = n-1; while (lo pivot) hi—; if (lo E), invariant implies postcondition Q {l  E}S{l} lA^E^Q {1} while E do S {Q} Find n knowing it’s initially between lo and hi: while (lo m)    both cases maintain lo m => n >= m+1 => n >= lo else hi = m;    !(n > m) => n n lo==n && n==hi assert(n == lo && n == hi); Consider {P} * x = 2 {v + *x = 4} What is the precondition P ? Right answer: v = 2 V x = But applying assignment rule (v + *x = 4)[*x 2] loses the second case We must model memory m = memory, a = address, d = data Consider the functions rd(m, a) return d and wr(m, a, d) return m1 Rule: rd(wr(m, ai, d), аг) =   л 32 31 ( rd(m, a?) if аг ф ai We must derive a property of memory m from the relation: rd(wr(m,x, 2), &v) + rd(wr(m,x, 2),x) = 4 rd(wr(m,x, 2), &v) + 2 = 4 rd(wr(m, x, 2), & v) = 2 x = & v Л 2 = 2 V x & v Л rd(m, &v) = 2 x = &гѵ V v = 2 E W Dijkstra Guarded Commands, Nondeterminacy and Formal Derivation of Programs (1975) - for a statement S and given postcondition Q there can be severai preconditions P such that {P} S {Q} or [P] S [(?] - Dijkstra establishes a precondition wp(S, Q) for successful termination of S with postcondition Q - necessary (weakest): if [P] S [Q] then P => wp(S, Q) - wp is a predicate transformer (transforms post- into precondition) precondiie) - allows defining a calculus with such transformations Assignment: wp(x := E, Q) = Q[x E] (see Hoare’s rule) Sequencing: wp(Si; S2, Q) = wp(Si, wp(S?, Q)) Decision: wp(if E then Si else S2, Q) = (E => wp(Si, Q)) A (->E => wp(S?, Q)) For loops, we need a recurrent computation Define wpk, assuming loop finishes in at most к iteration: wpo(while E do S, Q) = - Q (loop not entered) wpk+i(while E doS, Q)) = (E=> wp(S, wp ^while EdoS, Q))) A(-i  => Q) ( wp(Prog Post) check   Л E => wp(LoopBody,  ) for loops Marius Minea marius@cs upt ro 1 November 2017 Compilers: generate executable code Not a compilers course Basics for understanding   analyzing   reverse engineering code Preprocessing Lexical analysis (scanner) Syntactic Analysis (parser) Semantic Analysis e g type checking Code generation Concrete syntax includes representation details (keywords, punctuation) Abstract syntax represents conceptual structure (no keywords, but various node types, attributes, etc ) implicit language elements (conversions, ) may appear explicitly ifStmt => abstract syntax tree Cond ThenStmt ElseStmt AST is starting point for subsequent processing —> CFG = O, b, с = О; { b = а + 1; с = с + Ь; а = 2 * Ь; } (а x * x x * 4 -> x " 2 compute once into temp, use several times replace expensive with simpler operations (esp in loops) need to know: loop invariants loop induction variables ( =0, s=0; i сору loop body several times, reducing loop count count may be statically known or not Classic example: Duff’s device ( * , * , ) = (count +7)  8; (count % 8) { 0: { *to = *from++; 7: *to = *from++; 6: *to = *from++; 5: *to = *from++; 4: *to = *from++; 3: *to = *from++; 2: *to = *from++; 1: *to = *from++; } (-n > 0); Verification of timed systems Verification of timed systems 2 November 2005 - discrete and continuous time - extensions of untimed model checking algorithms - quantitative temporal log ies - timed automata = systems whose functional correciness depends on satisfying temporal constraints - safety-critical systems (aviation, military) - high-speed asynchronous circuits - process control and fabrication systems - communication protocols - consumer electronics (increasingly so, inel, automotive control) - timed synchronization protocols (e g in distributed systems) Any real system executes in physical time => untimed models studied so far are merely an e g temporal logic expresses , not properties Still: most formalisms start from an untimed description on top of which the time dimension is added later - : all events happen at multiples of a time quantum models: e g , automata with an integer duration for each transition - : events happen at arbitrary moments on a time scale of reals models: timed automata, timed Petri nets, languages with timing constructs Few formalisms are created specifically with time as first-class feature e g , [Zhou, Moare, Ravn '91], with operators: - |  J: duration for which f holds (integral over time) - concatenation of two time intervals sample property: gas leak lessthan 30 seconds in any one-hour interval Formal verification Lecture 5 Marius Minea Formal verification Lecture 5 Marius Minea Formal verification Lecture 5 Marius Minea Verification of timed systems Verification of timed systems What is the difference in expressiveness and efficiency ? (How does a continuous-time system compare to its model?) [Henzinger, Manna, Pnueli '92]: discuss (automata with lower upper bounds on transitions) - discretization preserves and some properties e g , invariance (Gp) and response (p => Fq) with time limits - for other properties, discrete-time versions can be derived [Asarin, Maler, Pnueli '98] discuss combinational circuits, with limited time delays on the output of every gate: -for acylic circuits, there is a discretization quantum which preserves qualitative properties (ordering of events) e g , 1 n for a circuit with n signals - there are circuits whose qualitative behavior is not preserved by any discretization (e g , ring of 3 inverters) Formal verification Lecture 5 Marius Minea One of main problems: Given a set of processes and their parameters (periods, deadline) is there a schedule that satisfies the deadlines ? [Lehoczky, Liu, Layland] - assigns priorities in increasing order of periods (provably optimal) - satisfiability test based on total CPU utilization (%) - Advantages: simple method, optimal, fast analysis - Disadvantages: restrictive model (periodic processes + some extensions); incomplete method, not applicable to high loads (> ln2   70%) We discus more general approaches Formal verification Lecture 5 Marius Minea RTCTL (real-time CTL) can express quantitative temporal properties (e g , p does not appear earlier than 5 time units) but not a more detailed analysis (what is the maximum delay of p) => We define algorithms that can calculate such parameters and have an efficient implementation (with BDDs) - length of shortest and longest path between two sets of States (expressed by predicates that characterize them) e g , longest execution time ( ) - minimal maximal number of occurrences of a property on a path e g , how many times the process is in the wait state [Courcoubetis & Yannakakis; Campos, Clarke et al ] Formal verification Lecture 5 Marius Minea Verification of timed systems Verification of timed systems Verification of timed systems Breath-first search from start until final first reached or no new States explored in each iteration: Q = States reached in i steps R = set of all reached States, grows until fixpoint procedure m in (start, final) for (i Need additional features to express real-time properties (e g , time-bounded response) Large variety of logics with explicit time, depending on various choices: - linear or branching-time - discrete or continuous time - with timed operators or explicit time variables Depending on choices => differences in expressivity, decidability, algorithmic complexity Formal verification Lectore 5 Marius Minea Verification of timed systems Verification of timed systems Simplifies expressing temporal properties in the discrete case by aug-menting temporal operators with time intervals Example: response in at most 10 time units to a continuous request: □Mp -► pUy (qRy Vi a AF g): p always followed by q within at most 3 time units For a = 0,6 = 00, we obtain CTL semantics Algorithms: recursive, by modification of fixpoint algorithms: - E[ UW]S] =   Л EXE[ U[a l b 1]S] (Vb > 0) - E[ U[o,6]S] = g V (  Л EX E[ U S]) (6 > 0) - Eifu [o ojs] = g Formal verification Lectore 5 Marius Minea [Alur, Henzinger 1989] - extension of propositional fragment of LTL (only path formulas) - linear time, discrete (interpreted over state sequences) - uses explicit variables for time, but with restrictions: each variable is bound to time in a certain state (by a quantifier) Example:: "each p is followed by a q within at most 10 time units" □s (p Oy (q  y pU3p (g A now = у Л у o(q   оу (г Ap i —> Ф2 i Зф] и сф2 i Ѵфіи сф2 (cu   unul din , ,=), pe AP, c e iN Semantics: s |= Зф Ѵ)^сф2 3p, 3t   c such that p(t) |= ф2 and for all 0 measures time passed since an event Set of clock constraints S(C) = conjunction of terms of form x -x c, c rx, x-y -2 C, with x,y e C, -xe { associates to each state an (limiting the passage of time in that state) - T c S " Z " B(C’) 2C S = set of transitions Transition (s, a, g, R, s'} from s to s' labeled by a is executed only if g is true, and clocks in R с C Formal verification Lecture 5 Marius Minea Formal verification Lecture 5 Marius Minea Formal verification Lecture 5 Marius Minea Verification of timed systems Verification of timed systems 17 Verification of timed systems 18 Set of States: pair (s,v), where s e S = location and v : C R is a clock assignment - automaton can stay in a state as long as invariant is satisfied (can leave state but cannot be forced unless invariant false) - or can (but must not) execute instantaneous transitions when asso-ciated guard is true Two types of transitions: - action: (s,v) A ,v') it there exists a transition {s,a,g, R,s'} e T, the guard g(y) is true for assignment v, and v' is obtained from v by resetting clocks from R  v' = v[R transition system with infinitely many States Paths of the form (s0,v0) ("oti) ("іт2) • • • Formal verification Lecture 5 Marius Minea Execute synchronous transitions if labls match, separate transitions otherwise Let Ai = (S'1,S'oi,Ei,C'1,71,7’1) and A2 = (S2, S02, E2, C-2,Z2,T2), with Ci n C2 = 0 Define A = Ai||A2 = (Si x S2,S0i x S02,Ei иЕ2,С1 UC2>i,T), where: - i((si,s2)) = li(si) AZ2(s2) - if {s ,a,g , R] ,s } e T  and {s2,a,92, - 2>s2) € T2, cu a e Ei nZ2 then ((si,s2), a,9i л 92, Ri и R2, (si> s2)) E T (synchronization) - if ("i,ai,gi, 21,51) e Ti, with a e Ei   E2, then Vs2 e S2, ((si,s2),ai,51,-Rl, (s'i’s2)) T if (s2 >  2> 92, ^2> s2)   -^2> with a e E2   E i, then Vs 1 e s 1, ((*2, *1), a2, 92, &2, (s2> *1)} E T More general: with synchronication function to match transition labels Fischer’s mutual exclusion protocol correciness based on respecting time constraints Synchronization: pairs of transitions a? and a! Can prove: correct if time constants (here 1 and 2) have this ordering Formal verification Lecture 5 Marius Minea Formal verification Lecture 5 Marius Minea Verification of timed systems Verification of timed systems 20 Verification of timed systems When are two States (s,v) and (s,v') equivalent ? Continuous-time model => more precise than discrete time; appropriate for modeling asynchronous and transient behavior E g delay element: propagates input to output - if input puise not shorter than l - with delay at most и Formal verification Marius Minea State = pair (s,v) of location and clock assignment => state space is infinite (even uncountable) But: we cannot observe behavior with arbitray precision - constraints from automaton have integer time limits - formulas of temporal logic also have integer constants Questions: - when are two States (s,v) and (s,v7) with same location, but different clock assignments equivalent ? - and is there a number of equivalence classes ? Two approaches: - time regions => region graph = finite automaton - time zones => geometric constraints, symbolic exploration Formal verification Lecture 5 Marius Minea 1 if same transitions can be taken from both States - conditions on transitions can have arbitrary integer time bounds e g there may be transitions a with x > 4 and b with x must have same integer part for all clocks (s,x = 4) cannot execute a, but (s,x = 4 1) can => fractional parts must both be nonzero or both zero 2 must execute transitions in same order consider transitions a with x >2 and b with у >3 from state (s,x = 1 5,у = 2 7) can execute b before a from state (s,x = 1 4,у = 2 3) can execute a before b => States are not equivalent => clocks must have same ordering for fractional parts Formal verification Lecture 5 Marius Minea Verification of timed systems 22 Verification of timed systems 23 Verification of timed systems 24 [Alur Or A |ѵ7(х)] > Or) Where Cx e z is largest constant with which x is compared in automaton (integer parts of clocks are either equal in both assignments or both exceed largest constant) - Ѵт,г  e c, LWJ associated with state (s,v) = set of States (s,v7) with v   v' => re prese ntat ion with finite number of equivalence classes Formal verification Lecture 5 Marius Minea У Regions are: Ex : region graph for two clocks and maximal constant c = 3 - O-dimensional: points with integer coordinates, x,y e {0,1,2,3} - one-dimensional: segments diagonals; open-ended segments (> 3) - two-dimensional: bounded (triangles) or not (rectangular stripes) Fro mtwo States (points) in the same region: -can execute same transitions - by passage of time, same regions are traversed Formal verification Lecture 5 Marius Minea [Alur, Courcoubetis, Dill '90] For the timed automaton A, define finite-state automaton 7?(A): - States of R(A) are regions - there is a transition between r and r' if and only if r' is the region of r with respect to time passage there is an action transition (s,v) —> (s7,v7) between two represen-tatives (timed States) (s,v) e r and (s7,v7) e r' Can prove: TCTL model checking for a timed automaton reduces to CTL model checking for the region graph (with additional clocks for time bounds on operators) Size of region graph: bounded by |(7|! • 2ІС‘І Y[xeC(2cx + 2) - exponential in number of clocks - exponential in value of maximal time constant Formal verification Lecture 5 Marius Minea Verification of timed systems 25 Verification of timed systems Verification of timed systems 27 Region graph: exponential in number of clocks => often costly to build and analyze => alternative re prese ntat ion with temporal inequalities Consider zones which are maximal with respect to passage of time in a location (up to time limit imposed by invariant) => initial zones: (soVoMAV1! = ij)) with s0 e So, xi,xj e C timed zone = condition from B(C) ex ж 0 ф(у — t) (eliminate inequalities x -x d) - impose the invariant of destination state (conjunct with i(s7)) Overall: => a zone = a convex union of regions A' = (A A s) [ Л 0] У Al(s') Formal verification Lectore 5 Marius Minea Formal verification Lectore 5 Marius Minea A zone = conjunction of inequalities x — у -к с, ж -x c or с -x ж => can be represented as square matrix of size |С| + 1 (one line for each clock and one line for comparing with zero) Matrix elements are integers from interval [-c,c]: value d for element (x,y) (x,y e C) means x — у стах becomes x -X oo x - have memory space (a few bytes) ; reals have To correctly work with numbers, we must understand: representation and storage in memory limitations errors limitations =^- errors Any value (parameter, variable, also constant) needs to be represented in memory and takes up some program space = unit of data storage that may hold , or need not be individually addressable (can’t refer to just one bit) = addressable unit of data storage that may hold a character formed of bits: CHAR BiT > 8 bits (limits h) 8 bits in all usual architectures the : gives size of a type or value in not sizeof (type) or sizeof expression ( ) is 1: for Unicode and wide character support: uchar h, wctype h an has ( ) => CHAR BiT* ( ) , big (10000) and small (5) use ( ) bytes! is NOT a function; evaluated (if possible) at compile-time in memory, numbers are represented in binary (base 2) : for N bits, value is computed as C V-l C V-2 • • • QCo (2) — C V-l • 2N 1 + + Ci • 21 + Со • 2° c v-i = (higher-order) bit (MSB) со = (lower-order) bit (LSB) Range of values: from 0 to e g 11111111 is 255 LSB cq = 0 => number; со = 1 number : MSB is sign; N-l bits value: several encodings i) sign-magnitude: if MSB is 1, take value part as negative ii) one’s complement: sign bit counts as —(2Л  1 — 1) iii) : sign bit counts as —2 v 1 Range for two’s complement is from to C V-2 • • • Ci Со (2) = + C V-2 • 2N 2 + + Со • 2° ( : 0 127 + 128 255 become-128 - 1 8-bit: 11111111 is -1 11111110 is -2 10000000 is -128 Before the type one can write specifiers for: , since C99 also (implicit, if not present), Can be combined; may omit : e g [-128, 127] or , : > 2 bytes, must cover [-215 (-32768), 215 - 1] : > 4 bytes, must cover [—231 (-2 147 4 83648) , 231 — 1] : > 8 bytes, must cover [-263, 263 - 1] Corresponding and have the same size: sizeof (short) use to find storage taken up by a type variable write programs assuming a given type has 2, 4, 8, bytes program will on other systems climits h> ( ( ( printf( ) { ( )); , iNT MiN); , UiNT MAX); base 10: as usual, e g , -5 base 8: prefixed by (zero): 0177 (127 decimal) base 16: prefixed by or : e g , OxlaE (430 decimal) Can’t write in any other base 1101110 suffixes: or for , e g , 65535u or for e g , 0177777L, or for printable : w  single quotes : ’0’, ’!’, ’a’ special characters: ’ 0’ nul ’  a’ alarm ’  b ’ backspace ’ t ’ tab ’  n ’ newline ’ v’ vert tab ’ f ’ form feed ’  r ’ carriage return double quote quote ’  V backslash octal (max 3 digits): ’ 14’ type may be hexadecimal (prefix x) : ’ xff’ OxFF: int 255, ’ xff’ may be -1 The type is Char constants are (of smaller size) to in expressions (this is why you don’t see functions with parameters) access the of data (e g , numbers) information (e g header fields in network packets or files; status values commands from to hardware) : sets of small integers one bit per element (1 = is member; 0 = is not member of set) one 32-bit int for any set of ints 0 31 (4 billion combinations) Set operations: intersection bitwise AND union bitwise OR add element set corresponding bit can be represented using bits: min sec (0-59): 6 bits hour (0-23): 5 bits day (1-31): 5 bits month (1-12): 4 bits year: 6 bits left from 32: 1970-2033 => need operations to get day month year from 32-bit value Can be used for All operators work with operands! Not float! independently (not just one bit!) & bitwise AND 1 bitwise OR bitwise XOR bitwise complement (1 only if both bits are 1) (1 if at least one of the bits is 1) (1 if exactly one of the bits is 1) (opposite value: 0 1) left shift with number of bits in second operand vacated bits are filled with zeros; leftmost bits are lost right shift with number of bits in second operand vacated bits filled with zero if number is unsigned or nonnegative else implementation-dependent (usually repeats sign bit) numbers 01 0 010 & 10 О 101 00 О ООО   01101010 10010101 011 1 10 01101010 i 101 1 01 ' 10101101 111 1 11 11000111 111010— " 2 —101010 " 2 111010 101010 numbers! Bit operators , they just give a result if x is 7, x+2 is 9, but x is still 7 Only x = x+2 changes x 1 Bitwise operators are no different! x & OxF or x >> 2 will compute some results, x will be the same! { (n > 8) printoct(n 8); putchar( + n % 8); 8 = 23 => Each octal digit corresponds to a group of 3 bits e g one hundred is 0 001100100 (82 + 4 • 8 + 4) => can use bit operators to isolate parts ( ) = n " 3; (nl) printoct(nl); putchar( + (n & 7)); Likewise, can use groups of 4 bits to obtain hex digits careful to get either ’0’ ’9’ or ’A’ ’F’ for printing Bitwise operators work with bits But, if choosing the appropriate operation and operand ("mask") we сап a single bit 1 >= 1) putchar(n & m ? : ); 2) constant mask, shift number ( ) { ( = 1; m; m >1) ? : ); 3) same, but directly check sign bit ( ) { ( = 1; m; m is 2k for к > к has value n 2k (integer division) for unsigned nonnegative use this, pow (which is floating-point!)  (1 " k) only bit к is 0, rest are 1 0 has all bits 0,  0 has all bits 1 (= -1, since it’s a signed int) preserves signedness, so  0u is unsigned (UiNT MAX) Bit ops produce results (like +, *, etc), Only operators (and pointer dereference) change values! Value given by bits 0-3 of n: AND with 0 01111(2) n & OxF Reset bits 2, 3, 4: AND with  0 011100(2) n &=  0xlC Set bits 1-4: OR with 11110(2) n |= OxlE n |= 036 Flip bits 0-2 of n: XOR with 0 0111(2) n  = 7 => choose fitting operator and (easier written in hex octal) integer with all bits 1: к rightmost bits 0, rest 1: к rightmost bits 1, rest 0:  ( 0 " k) " p (n " p) &  ( 0 " k): n & ( ( 0 " k) " p):  0 (signed) or  0u (unsigned)  0 " к  ( 0 " k) has к bits of 1, starting at bit p, rest 0 n shifted p bits, reset all except last к reset all except к bits starting at bit p We have discussed (visibility) and (storage duration) : how do same names in different scopes files link ? identifiers declared with keyword have internai linkage (are not linked to objects with same name in other files) Storage duration if declared is lifetime of program in function: local scope but preserves value between calls initialization done only once, at start of lifetime ( ) { = 0; cnt++; ( ) { ( , counterO); printf( , counter()); 0; : with decimal point, optional sign and exponent (prefix e or E); integer or fractional part may be missing: 2 5 l e-6 5E+6 suffix f, F: ; 1, L: implicit type of floating constants: function arguments are promoted to e g in calls to printf, where means Sample from : 4 bytes, ca iO-38 to iO38, FLT MiN 1 17549435e-38F FLT MAX 3 40282347e+38F : 8 bytes, | ca 1O 308 to iO308, DBL MiN 2 2250738585072014e-308 DBL MAX 1 7976931348623157e+308 : for higher range and precision (12 bytes) Similar to scientific normalized notation in base 10: 6 022 • iO23, 1 6 • 10-19: (7^ 0), decimals, exp of 10 in computer: ; ( l)s gn * 2exp * l mant ssa(2) (significand) 1 9 75 = 23 • 1^ = 00111(2) is v O , 1ООООО1О 0011100 0 sign 127+3 23-bit mantissa Extracting the mantissa M as (low-order 23 bits) and adding the implicit 1 on bit 23: Ml = 1 1 is 1 + 2-23 (last bit of mantissa is 1) For larger numbers, imprecision grows e g , 224 + 1 = 224 * (1 + 2 - 24), last 1 bit does not fit in mantissa => can represent 224 and 224 + 2, but 224 + 1 is rounded up FLT EPSiLON 1 19209290e-07F DBL EPSiLON 2 2204460492503131e-16 E = 0: 0 and small (denormal) numbers: (—l)s *2 126 *0  W(2) E = 255: zbiNFiNiTY, NAN (not-a-number, error) Use for sufficient precision in computations! math h functions: ; variants with suffix: sin, sinf, sini C standard also specifies rounding directions, exceptions traps, etc (even ) may have small range (32 bits: ± 2 billion) Not enough for computations with large integers (factorial, etc ) Use (bigger range) or arbitrary precision libraries (bignum) Floating point has limited precision: beyond 1E16, double does not distinguish two consecutive integers! A decimal value may not be precisely represented in base 2: may be periodic fraction: l-2(io) — l(0011)(2) printf( , 32 if); writes 32 099998 Due to precision loss in computation, result may be inexact replace test x==y with fabs(x - y) for x u) puts( ); compile with -Wconversion and -Wsign-compare or -Wextra (summary of previous rules) integer to floating point, smaller type to larger type : short, char, bool to int Conversions in : truncated if ivalue not large enough с; ; c = i; 11! Right-hand side evaluated of left-hand sidell! = 43000, usd rol = 31000 = eur rol   usd rol; (integer division happens before assignment to double) Floating point is truncated towards zero when assigned to int (fractional part disappears) : typename expression converts expression as if assigned to a value of the given type eur usd = ( )   usd rol may be or (implementation dependent, check CHAR MiN: 0 or SCHAR MiN) different int conversion if bit 7 is 1 (’ xff’ = -1) getchar putchar work with converted to : most any arithmetic operation can cause overflow printf( , 1222000333 + 1222000333); (if 32-bit, result has higher-order bit 1, and is considered negative) printf( , 215400011111 + 215400011111) ; when comparing   converting and (-5 > 433322211111) printf( ); because -5 converted to unsigned has higher value Correct comparison between and if (i = 0 && i >= u) (compares i and u only if i is nonnegative) Check for overflow on integer sum int z = x + y: if (x > 0 && у > 0 && z = 0) right-shift a negative int! -= ; -( ; n; n "= 1 )— May loop forever if n negative; the topmost bit inserted is usually the sign bit (implementation-defined) Use unsigned (inserts a 0) shift with more than bit width (behavior undefined) AND with a one-bit mask is not 0 or 1, but 0 or nonzero n & (1 " k) is either 0 or 1 direction of analysis is backwards Meet (combine) operation: f x J 0 if succ(s) = 0 out 5 " t Us'esucc(s) LVin(s') otherwise => combination is union (may, at least one path) Computation: worklist algorithm that makes changes from initial values until there are no more changes is reached At every program point, what are the expressions whose value is available (previously computed) without having changed on any path to that point? if value is stored in a temp   register, need not recompute Transfer function: AEout(s) = (AEm(s)   {e | V(e) П wr te(s) Ф 0}) U{e G Subexp(s) | V(e) П write(s) = 0} (expressions at entry of s that have not been changed by s, and any expressions computed in s without change to their variables) Meet (combine) operation: 0 if pred(s) = 0 As'epre^s) Д о^(5') otherwise => combination done by intersection (must, on all paths); analysis direction is forward AEm(s) = What expressions must be evaluated on any path from the current point before any of their variables is modified ? => evaluation can be hoisted up to the current point, before any branches - a backwards and must (universal) analysis VBEjn(s) = (VBEout(s)   {e | V(e) П write(s) 0}) U Subexp(s) VBEout(s) = As'esucc(s) VBEints') if succ(s) = 0 otherwise , for each problem: we analyze some property, e g - value of a variable at a program point - or interval of values for a variable - or sets of variables (live), expressions (available, very busy), - possible definitions for a value (reaching definitions), etc view: a set D of values for a property (dataflow facts) Restriction: D is a set A is а set, in which every pair of elements has a least upper bound and a greatest lower bound (an element "larger", resp "smaller" than either of them) Ex: powerset of a set (intersection, union) Ex: set of divisors of a number (gcd, least common multiple) image: http:  en wikipedia org wiki File:Hasse diagram of powerset of 3 svg http:  en wikipedia org wiki File:Lattice of the divisibility of 60 svg domain: program statements change program state e g , value of variable after a statement is a function of its value before the statement domain: Each statement s has an associated F(s) : D —> D that determines how the value of a property at the start of a statement is changed by that statement: Valout(s) = F(s)(Valin(sy) (for analysis going forward), or conversely (for backwards analyses) Restriction: analysis is easier for monotone transfer functions: x E У => f (x) C f (y) (intuition: if the argument is more precise, so is the result) Special case: bitvector frameworks: the lattice is a powerset, P(D), transfer functions are monotone, of the form: F(s)(v) = (v   kiliQsY) Li gen(s) (v = dataflow fact, gen kill(s) = information generated deleted by s) Example for forward analyses: Valout(s) = F(s)(Valin(s)) Valin(s) = ris'eprec (s) Valout(sr) where П is meet (combining effects) over several paths (could be П or U) intially, we know value of Valout(entry) For backwards analyses, we initially know Va ,n(ex t) and the roles of in and out are switched То compute а solution to this equation system: an iterative algorithm that propagates changes in the direction of the analysis foreach se Л do Va ,n(s) = T   no info Valjn(entry) = init    depending on analysis И  = {entry} while W 0 choose s G W old out = Valout(s) W = И   {s} Va ,n(s) = ns eprec (s) Valout(sr) Valout(s) = F(s)(Valin(s)) if Valoutts) Ф old out then fora ii s' G succ(s) do W = Ю {s'} Termination of analysis is guaranteed if the transfer function is monotone: x Zy =- f(x) C f(y), which implies that the computed values Def: A of a function f is a value x so that f(x) = x Kanster-Tarski theorem guarantees that a monotone function over a complete lattice has a least and a greatest fixpoint The worklist algorithm computes the least fixpoint solution for the equation system given by the transfer functions We wish to compute the combined effect of the program statements: For a path (statement sequence) p = sis2 sn we define F(p) = F(s") o o F(s2) o F(S1) and we wish to compute: Пр Раі і(Рго ) Fp(entry) The iterative algorithm combines effects at each join point before continuing computation Since functions are monotone, we have: f(xUy)3f(x)Uf(y) so analysis loses precision Distributive transfer functions satisfy: f(x) Uf(y) = f(xUy) in this case, the iterative fixpoint algorithm is equivalent with meet over all paths => combining info on execution paths does not lose precision All 4 classical examples (live variables, etc ) are distributive - forward or backwards - must or may - flow-sensitive or insensitive (flow = control flow) e g , does the statement order in the program matter ? - no: for variable used changed, called functions, etc - yes: for properties linked to actual values computed by program - context-sensitive or context-insensitive ? is the analysis of a function procedure specialized depending on the caii site or not ? (generic function summary) - pat 7-sensitive or path-insensitive does it account for correlation between execution paths ? 2 November 2005 - discrete and continuous time - extensions of untimed model checking algorithms - quantitative temporal logics - timed automata Formal verification Lecture 5 Marius Minea Verification of timed systems 2 = systems whose functional correciness depends on satisfying temporal constraints - safety-critical systems (aviation, military) - high-speed asynchronous circuits - process control and fabrication systems - communication protocols - consumer electronics (increasingly so, inel, automotive control) - timed synchronization protocols (e g in distributed systems) Formal verification Lecture 5 Marius Minea Verification of timed systems 3 Any real system executes in physical time => untimed models studied so far are merely an e g temporal logic expresses , not properties Still: most formalisms start from an untimed description on top of which the time dimension is added later - : all events happen at multiples of a time quantum models: e g , automata with an integer duration for each transition - : events happen at arbitrary moments on a time scale of reals models: timed automata, timed Petri nets, languages with timing constructs Few formalisms are created specifically with time as first-class feature e g , [Zhou, Hoare, Ravn ’91], with operators: -  J : duration for which f holds (integral over time) - concatenation of two time intervals sample property: gas leak less than 30 seconds in any one-hour interval Formal verification Lecture 5 Marius Minea Verification of timed systems 4 What is the difference in expressiveness and efficiency ? (How does a continuous-time system compare to its model?) [Henzinger, Manna, Pnueli ’92]: discuss (automata with lower upper bounds on transitions) - discretization preserves and some properties e g , invariance (Gp) and response (p => Fq) with time limits - for other properties, discrete-time versions can be derived [Asarin, Maler, Pnueli ’98] discuss combinational circuits, with limited time delays on the output of every gate: - for acylic circuits, there is a discretization quantum which preserves qualitative properties (ordering of events) e g , 1 n for a circuit with n signals - there are circuits whose qualitative behavior is not preserved by any discretization (e g , ring of 3 inverters) Formal verification Lecture 5 Marius Minea Verification of timed systems 5 One of main problems: Given a set of processes and their parameters (periods, deadline) is there a schedule that satisfies the deadlines ? [Lehoczky, Liu, Layland] - assigns priorities in increasing order of periods (provably optimal) - satisfiability test based on total CPU utilization (%) - Advantages: simple method, optimal, fast analysis - Disadvantages: restrictive model (periodic processes + some exten-sions); incomplete method, not applicable to high loads (> ln2   70%) We discus more general approaches Formal verification Lecture 5 Marius Minea Verification of timed systems 6 RTCTL (real-time CTL) can express quantitative temporal properties (e g , p does not appear earlier than 5 time units) but not a more detailed analysis (what is the maximum delay of p) => We define algorithms that can calculate such parameters and have an efficient implementation (with BDDs) - length of shortest and longest path between two sets of States (expressed by predicates that characterize them) e g , longest execution time ( ) - minimal maximal number of occurrences of a property on a path e g , how many times the process is in the wait state [Courcoubetis Yannakakis; Campos, Clarke et al ] Formal verification Lecture 5 Marius Minea Verification of timed systems 7 Breath-first search from start until final first reached or no new States explored in each iteration: Q = States reached in i steps R = set of all reached States, grows until fixpoint procedure min(start, final) for (z"— 0, R "— Q "— start', Q П final = 0; z++) do Q Need additional features to express real-time properties (e g , time-bounded response) Large variety of logics with explicit time, depending on various choices: - linear or branching-time - discrete or continuous time - with timed operators or explicit time variables Depending on choices => differences in expressivity, decidability, algorithmic complexity Formal verification Lecture 5 Marius Minea Verification of timed systems 10 Simplifies expressing temporal properties in the discrete case by aug-menting temporal operators with time intervals Consider a path тг = We detine: - 7Г |= fU[a bjg О Bi a Si |= f - тг |= F[a 6]  o Si a °) - E[ U[O;6] 0) E[fU 5] = 9 Formal verification Lecture 5 Marius Minea Verification of timed systems 11 [Alur, Henzinger 1989] - extension of propositional fragment of LTL (only path formulas) - linear time, discrete (interpreted over state sequences) - uses explicit variables for time, but with restrictions: each variable is bound to time in a certain state (by a quantifier) Example:: "each p is followed by a q within at most 10 time units" x (p oy ^q    у , ,=), pe AP, c e iN Semantics: s |= Эф1и сф2 o   c such that p(P) |= Ф2 and for all 0 measures time passed since an event 3 >3 2 ^0 x >4 x,y 0 Formal verification Lecture 5 Marius Minea Verification of timed systems 15 Set of clock constraints B(C') = conjunction of terms of form x - ), where s e S = location and v : C iR is a clock assignment - automaton can stay in a state as long as invariant is satisfied (can leave state but cannot be forced unless invariant false) - or can (but must not) execute instantaneous transitions when asso-ciated guard is true Two types of transitions: - action: (s,t?) Л (s7,^) if there exists a transition (s,a,g, R, s') e T, the guard g(v) is true for assignment v, and vr is obtained from v by resetting clocks from R  vr = v[R 0] - passage of time: (s,t>) Д (s,? ) if vf = v+d (і (ж) = v(x) + d,   x e C), si i(s)(i? + e) true Ve e (invariant is preserved) => transition system with infinitely many States Paths of the form (s0, vq) (sq, ^i) ("i, ^i) (si,^2) • • • Formal verification Lecture 5 Marius Minea Verification of timed systems 17 Execute synchronous transitions if labls match, separate transitions otherwise Let Ai — (5i,50i, Ei,Ci,7i,Ti) and — (52, 5q2, ^2, C-2, 72, T2), with Ci П C2 = 0 Define A = А1ЦА2 = (5i x 52, 5qi x 5q2, ^1 U E2, Ci U C2,7, T), where: - 7((si, s2)) = Л("1) A 72(s2) — (sl, 91, Rl, si) Ti and (^2, a, more precise than discrete time; appropriate for modeling asynchronous and transient behavior E g delay element: propagates input to output - if input puise not shorter than l - with delay at most и Formal verification Lecture 5 Marius Minea Verification of timed systems 20 State = pair of location and clock assignment => state space is infinite (even uncountable) But: we cannot observe behavior with arbitray precision - constraints from automaton have integer time limits - formulas of temporal logic also have integer constants Questions: - when are two States (s,i?) and with same location, but different clock assignments equivalent ? - and is there a number of equivalence classes ? Two approaches: - time regions => region graph = finite automaton - time zones => geometric constraints, symbolic exploration Formal verification Lecture 5 Marius Minea Verification of timed systems 21 When are two States ("s, г?) and (s,t ) equivalent ? 1 if same transitions can be taken from both States - conditions on transitions can have arbitrary integer time bounds e g there may be transitions a with x > 4 and b with x must have same integer part for all clocks (s,x = 4) cannot execute a, but (s,x = 4 1) can => fractional parts must both be nonzero or both zero 2 must execute transitions in same order consider transitions a with x > 2 and b with у > 3 from state (s,x = 1 5,у = 2 7) can execute b before a from state ( States are not equivalent => clocks must have same ordering for fractional parts Formal verification Lecture 5 Marius Minea Verification of timed systems 22 [Alur & Dill ’90]: Define v   vr if: —   x 6 C | i7(ж)J = | 1 (ж)J v (H^)J > cx л K(^)J > cx) where cx e is largest constant with which x is compared in automaton (integer parts of clocks are either equal in both assignments or both exceed largest constant) — Ѵж,т  e c, Иж)} (ж)J {і (ж)} = 0 => associated with state (s,i?) = set of States (s,t ) with v   vr => representation with finite number of equivalence classes Formal verification Lecture 5 Marius Minea Verification of timed systems 23 Regions are: Ex : region graph for two clocks and maximal constant c = 3 - O-dimensional: points with integer coordinates, x,y e {0,1,2,3} - one-dimensional: segments diagonals; open-ended segments (> 3) - two-dimensional: bounded (triangles) or not (rectangular stripes) Fro mtwo States (points) in the same region: - can execute same transitions - by passage of time, same regions are traversed Formal verification Lecture 5 Marius Minea Verification of timed systems 24 [Alur, Courcoubetis, Dill ’90] For the timed automaton A, define finite-state automaton E(A): - States of E(A) are regions - there is a transition between r and rf if and only if rf is the region of r with respect to time passage there is an action transition Л (s'X) between two represen-tatives (timed States) (s,i?) e r and (У,? ) e rr Can prove: TCTL model checking for a timed automaton reduces to CTL model checking for the region graph (with additional clocks for time bounds on operators) Size of region graph: bounded by |C|! • 2ІСІ Пжес(2сж + 2) - exponential in number of clocks - exponential in value of maximal time constant Formal verification Lecture 5 Marius Minea Verification of timed systems 25 Region graph: exponential in number of clocks => often costly to build and analyze => alternative representation with temporal inequalities timed zone = condition from Б(С) ex x a zone = a convex union of regions Formal verification Lecture 5 Marius Minea Verification of timed systems 26 Consider zones which are maximal with respect to passage of time in a location (up to time limit imposed by invariant) => initial zones: (sq,i(sq) Л = Xj)) with sg E Sq, x^xj e C Successors of a zone ф by a combined action + time transition: - conjunct with guard g of transition: ф Л g - reset clocks associated with transition ф[х 0] = Эхф л (ж = 0) (existential quantification over x e R, and then conjunction with x = 0) - take into account passage of time: ф^ = 3t > 0 ф(ѵ -t) (eliminate inequalities x d) - impose the invariant of destination state (conjunct with І(У)) Overall: = ( can be represented as square matrix of size |C| + 1 (one line for each clock and one line for comparing with zero) Matrix elements are integers from interval [—c, c]: value d for element (ж, у) (x, у e C) means x - у cmax becomes x oo x -k d pentru d have memory space (a few bytes) ; reals have To correctly work with numbers, we must understand: representation and storage in memory limitations errors limitations =^- errors Any value (parameter, variable, also constant) needs to be represented in memory and takes up some program space = unit of data storage that may hold two values (0 or 1) need not be individually addressable (can’t refer to just one bit) = addressable unit of data storage that may hold a character formed of bits: CHAR BiT > 8 bits (limits h) 8 bits in all usual architectures the : gives size of a type or value in sizeof (type) or sizeof expression ( ) is 1: for Unicode and wide character support: uchar h, wctype h an int has ( ) => CHAR BiT* ( ) (big and small) take up ( ) bytes! is NOT a function; evaluated (if possible) at compile-time in memory, numbers are represented in binary (base 2) : for N bits, value is computed as C V-l C V-2 • • • QCo (2) — СЛ -1 • 2N 1 + + Ci • 21 + Cq   2° c v-i = (higher-order) bit (MSB) со = (lower-order) bit (LSB) Range of values: from 0 to e g 11111111 is 255 LSB cq = 0 => number; со = 1 number : MSB is sign; N-l bits value: several encodings i) sign-magnitude: if MSB is 1, take value part as negative ii) one’s complement: sign bit counts as —(2Л  1 — 1) iii) : sign bit counts as —2 v 1 Range for two’s complement is from to C V-2 • • • Ci Со (2) = + C V-2 • 2N 2 + + Со • 2° ( 0 127 same; 128 255 become-128 -1 8-bit: 11111111 is -1 11111110 is -2 10000000 is -128 Before the type one can write specifiers for: , since C99 also (implicit, if not present), Can be combined; may omit : e g unsigned short [-128, 127] or , : > 2 bytes, must cover [-215 (-32768), 215 - 1] : > 4 bytes, must cover [—231 (-2 147 4 83648) , 231 — 1] : > 8 bytes, must cover [-263, 263 - 1] Corresponding and have the same size: sizeof (short) use to find storage taken up by a type variable write programs assuming a given type has 2, 4, 8, bytes program will on other systems climits h> ( ( ( printf( ) { ( )); , iNT MiN); , UiNT MAX); base 10: as usual, e g , -5 base 8: prefixed by (zero): 0177 (127 decimal) base 16: prefixed by or : e g , OxlaE (430 decimal) Can’t write in any other base 1101110 suffixes: or for , e g , 65535u or for e g , 0177777L, or for printable : w  single quotes : ’0’, ’!’, ’a’ special characters: ’ 0’ nul ’  a’ alarm ’  b ’ backspace ’ t ’ tab ’  n ’ newline ’ v’ vert tab ’ f ’ form feed ’  r ’ carriage return double quote quote ’  V backslash octal (max 3 digits): ’ 14’ type may be hexadecimal (prefix x) : ’ xff’ OxFF: int 255, ’ xff’ may be -1 The type is Char constants are (of smaller size) to in expressions (this is why you don’t see functions with parameters) access the of data (e g , numbers) information (e g header fields in network packets or files; status values commands from to hardware) : sets of small integers one bit per element (1 = is member; 0 = is not member of set) one 32-bit int for any set of ints 0 31 (4 billion combinations) Set operations: intersection bitwise AND union bitwise OR add element set corresponding bit can be represented using bits: min sec (0-59): 6 bits hour (0-23): 5 bits day (1-31): 5 bits month (1-12): 4 bits year: 6 bits left from 32: 1970-2033 => need operations to get day month year from 32-bit value Can be used for All operators work with They & bitwise AND 1 bitwise OR bitwise XOR bitwise complement operands! Not float! independently (not just one bit!) , just give a result (like +, *, etc ) (1 only if both bits are 1) (1 if at least one of the bits is 1) (1 if exactly one of the bits is 1) (opposite value: 0 1) left shift with number of bits in second operand vacated bits are filled with zeros; leftmost bits are lost right shift with number of bits in second operand vacated bits filled with zero if number is unsigned or nonnegative else implementation-dependent (usually repeats sign bit) numbers 01 0 010 & 10 О 101 00 О ООО   01101010 10010101 011 1 10 i 101 1 01 111 1 11 111010— " 2 111010 01101010 10101101 11000111 —101010 " 2 101010 numbers! { (n > 8) printoct(n 8); putchar( + n % 8); 8 = 23 => Each octal digit corresponds to a group of 3 bits e g one hundred is 0 001100100 (82 + 4 • 8 + 4) => can use bit operators to isolate parts ( ) = n " 3; (nl) printoct(nl); putchar( + (n & 7)); Likewise, can use groups of 4 bits to obtain hex digits careful to get either ’0’ ’9’ or ’A’ ’F’ for printing Bitwise operators work with bits But, if choosing the appropriate operation and operand ("mask") we can a single bit 1 >= 1) putchar(n & m ? : ); 2) constant mask, shift number ( ) { ( = 1; m; m >1) ? : ); 3) same, but directly check sign bit ( ) { ( = 1; m; m is 2k for к > к has value n 2k (integer division) for unsigned nonnegative use this, pow (which is floating-point!)  (1 " k) only bit к is 0, rest are 1 0 has all bits 0,  0 has all bits 1 (= -1, since it’s a signed int) preserves signedness, so  0u is unsigned (UiNT MAX) Bit ops produce results (like +, *, etc), Value given by bits 0-3 of n: AND with 0 01111(2) n & OxF Reset bits 2, 3, 4: AND with  0 011100(2) n &=  0xlC Set bits 1-4: OR with 11110(2) n |= OxlE n |= 036 Flip bits 0-2 of n: XOR with 0 0111(2) n  = 7 => choose fitting operator and (easier written in hex octal) integer with all bits 1: к rightmost bits 0, rest 1: к rightmost bits 1, rest 0:  ( 0 " k) " p (n " p) &  ( 0 " k): n & ( ( 0 " k) " p):  0 (signed) or  0u (unsigned)  0 " к  ( 0 " k) has к bits of 1, starting at bit p, rest 0 n shifted p bits, reset all except last к reset all except к bits starting at bit p We have discussed (visibility) and (storage duration) : how do same names in different scopes files link ? identifiers declared with keyword have internai linkage (are not linked to objects with same name in other files) Storage duration if declared is lifetime of program in function: local scope but preserves value between calls initialization done only once, at start of lifetime ( ) { = 0; cnt++; ( ) { ( , counterO); printf( , counter()); 0; Similar to scientific representation in base 10: 6 022 • iO23, 1 6 • 10-19: , decimals, exponent of 10 in computer: ; (significand) ( l)s gn * 2exp * l mant ssa(2) the 1 before mantissa is (not in bit pattern) exp chosen to have leading 1: 1 1 is 1 + 2-23 (last bit of mantissa is 1) For larger numbers, imprecision grows e g , 224 + 1 = 224 * (1 + 2 - 24), last 1 bit does not fit in mantissa => can represent 224 and 224 + 2, but 224 + 1 is rounded up FLT EPSiLON 1 19209290e-07F DBL EPSiLON 2 2204460492503131e-16 for E = 0, small (denormalized) numbers: (—l)s * 2 126 * O  W(2) also: representations for ±oo, errors (NaN) Use for sufficient precision in computations! math h functions: ; variants with suffix: sin, sinf, sini C standard also specifies rounding directions, exceptions traps, etc (even ) may have small range (32 bits: ± 2 billion) Not enough for computations with large integers (factorial, etc ) Use (bigger range) or arbitrary precision libraries (bignum) Floating point has limited precision: beyond 1E16, double does not distinguish two consecutive integers! A decimal value may not be precisely represented in base 2: may be periodic fraction: l-2(io) — l(0011)(2) printf( , 32 if); writes 32 099998 Due to precision loss in computation, result may be inexact replace test x==y with fabs(x - y) for x u) puts( ); compile with -Wconversion and -Wsign-compare or -Wextra (summary of previous rules) integer to floating point, smaller type to larger type integer promotions: short, char, bool to int Conversions in : truncated if ivalue not large enough с; ; c = i; ii! Right-hand side evaluated of left-hand sidell! = 43000, usd rol = 31000 = eur rol   usd rol; (integer division happens before assignment to double) Floating point is truncated towards zero when assigned to int (fractional part disappears) : typename expression converts expression as if assigned to a value of the given type eur usd = ( )   usd rol char may be signed or unsigned (implementation dependent, check CHAR MiN: 0 or SCHAR MiN) different int conversion if bit 7 is 1 (’ xff’ = -1) getchar putchar work with unsigned char converted to int : most any arithmetic operation can cause overflow printf("7,d n", 1222000333 + 1222000333);    -1850966630 (if 32-bit, result has higher-order bit 1, and is considered negative) printf("7,u n", 2154000111U + 2154000111u);    overflow: 4032926 when comparing   converting signed and unsigned (-5 > 4333222111u) printf( ); because -5 converted to unsigned has higher value Correct comparison between int i and unsigned u: if (i = 0 && i >= u) (compares i and u only if i is nonnegative) Check for overflow on integer sum int z = x + y: if (x > 0 && у > 0 && z = 0) right-shift a negative int! -= ; -( ; n; n "= 1 )— May loop forever if n negative; the topmost bit inserted is usually the sign bit (implementation-defined) Use unsigned (inserts a 0) shift with more than bit width (behavior undefined) AND with a one-bit mask is not 0 or 1, but 0 or nonzero n & (1 " k) is either 0 or 1 stops when transformation produces no change No change: f(x) = x of applied transformation foreach s do Out(s) = T    no info И  = {entry}    worklist do choose s G iV    statement to be considered И  = И    {s}    remove from worklist old = Out(s)    save current value of interest  n(s) = join Out(s') forall s' G pred(s)    update inputs Out(s) = Transfers(Jn(sy)    apply meaning of statement if Out(s) Ф old then    recompute affected successors forall s' G succ(s) do W = W U {s'} while W 0 Values at each statement are if universe of values and computed functions are , worklist algorithm terminates Often: sets of boolean properties => variable live at line   ? def at line   reaches line j ? 1 = 0, b 2 3 b = a + 1; 4 c = c + b; 5 a = 2 * b; 6 } (а spurious warning { (x > 0) { г = у + 2 * x; } { г = х + 3 * z; } г; } Establish relations between inputs and outputs forward computation or backward computation (what values could have produced current result?) need to deal with loops: assume bounds on loop iterations, or try to compute fixpoints function summary computed once, independently of caii site function summary specialized for each caii site tradeoff precision for complexity Engler, Chelf, Chou, Hallem: Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions, OSDi 2000 (best paper) went on to build Coverity many other papers on simple, small, efficient static checkers for mining error patterns for concurrent programs etc Elemente of Mathematical Logic Elemente of Mathematical Logic November 10, 2005 - Propositional calculus - Predicate calculus - Decision procedures - Resolution theorem proving of propositional logic: p,g,r,   -i and —and parantheses ( ) of propositional logic: - any atomic proposition is a formula - if a is a formula, then (->a) is a formula - if a and  3 are formulas, then (a —" 3) is a formula Other known operators can be introduced as shorthands: - (a A 3)d= OV-(V))) - ("V 3)d= ((^")- 3) - ("" 3)d= (Ѵ- з)л( з-")) Simplified notation: without redundant parantheses; precedence order defined as: л, v, ——> is right-associative A valuation v is a function defined for all propositional formulas, with values in {T, F} such that: - v(p) is defined for any atomic proposition p , , f T if = F " [ F if i’(a) = T - -  3) = O if’Ы = Т?1 V 3) = F v ( T otherwise An = a valuation for the atomic propositions of a formula An intrepretation satisfies a formula if the latter is evaluated to T (we say that the interpretat ion is a for that formula) formula ( ): true in all interpretat io ns formula: true in at least one interpretation unsatisfiable formula ( ): false in any interpretation Formal Verification Lectore 6 Marius Minea Formal Verification Lectore 6 Marios Minea Formal Verification Lectore 6 Mari и e Minea Elemente of Mathematical Logic Elemente of Mathematical Logic Elemente of Mathematical Logic approach: based on H |= ifi (logical truth) A set of formulas H implies a formula y? if any truth function that satisfies H (i e , all formulas in Я) also satisfies - based on syntactic manipulation of formulas: is a theorem provable from a set of , using for propositional logic: Al: ("->( ?-> a)) A2: ((a — ( 3 — 7)) — ((a —  3) — (a — 7))) A3: (((^ 3) - M) - (((^ 3) - a) -  3)) (called schemes (schemata) because axioms are obtained substituting particular formulas of propositional logic) We introduce a single deduction rule ( ): From the formulas y? and W we can deduce -0 Let H be a set of formulas We caii from H a sequence of formulas A1,A2, - ,An, such that: 1 At is an axiom, or 2 Ai is a formula from H, or 3 Ai follows by MP from two previous sequence items Aj,At where j a Any consistent set of formulas can be extended to a set (adding any other formula makes it inconsistent) A set of formulas is if and only if it is Formal Verification Lectore 6 Marius Minea The symbols of a first-order language are: - parantheses ( ) - logical connectors -> and —> - the quantifierV (universal quantifier) - a set of identifiers vq,vi, - for - a (possibly empty) set of symbols for - for any n > 1 a set of n-ary symbols (of n arguments) - for any n > 1 a set of n-ary (relations) First-order languages with equality: contain = as special symbol in addition to the above Formal Verification Lectore 6 Marius Minea Elements of Mathematical Logic Elements of Mathematical Logic Elements of Mathematical Logic of a first-order language (defined by structural induction) - any variable symbol vn - any constant symbol c - f (ti> • • • in), if f is an n-ary function symbol and ti,••• ,in are terms of a first-order language: - P(ti,• • • ,in)- where P is an n-ary predicate and are terms - t  = t2, where ti and t2 are terms (for languages with equality) - -ia, where а is a formula - a  3, where a,(3 are formulas -   vn U Extending the valuation s to terms and formulas we obtain a truth function (valuation) for all formulas in    Ne write 11= s(^>) or i  = ) if 11= s(^>) for any valuation s Formal Verification Lectore 6 Marius Minea Define: variable x can be with term t in   y ( 3 -> a)) A2: ((a — ( 3 — 7)) — ((a —  3) — (a — 7))) аз: (((fii-, - ы) - (((v) -") - m A4: (fix(a —  3) — (Via — Vt 3)) A5: (Vsa а[ж V;e") if x does not appear free in a For equality, we also add A7: x = x A8: x = у —> a = 3 where  3 is obtained from a by replacing arbitrarily many occurrences of x with y Formal Verification Lectore 6 Marius Minea Elements of Mathematical Logic Elements of Mathematical Logic Elements of Mathematical Logic Let H be a set of formulas and у? a formula We say that H implies y? (Я |= L from all clauses if S is empty return TRUE elsif S contains the empty clause return FALSE until no more changes choose a literal L from S for decomposition (true false) if Satisfiable (S и {L}) return TRUE elsif Satisfiable (S U {->L}) return TRUE else return FALSE Formal Verification Lectore 6 Marius Minea Formal Verification Lectore 6 Marius Minea Formal Verification Lectore 6 Marius Minea Elements of Mathematical Logic Elements of Mathematical Logic Elements of Mathematical Logic Great variety: - for proving results from mathematics - for system verification (especially programs) Generally, implemented for higher-order log ies - allow types described by means of predicates - have inductive capabilities Basic approaches to proving: - forward chaining (derive theorems getting closer to the goal) - or backwards chaining (generate intermediate conclusions for the given goal) - application of inference rules: controlled by Any formula without free variables in predicate calculus can be written in clausal form in a sequence of 8 steps Example: start with VrLP(r) - ly(D(X, у) Л 4WO) у) V Я(т, а))] Л ^ѴтР(т) (1) Eliminate all connectors except А, V, V ^y{D(X, у) л XWO) y) V Я(т, а)))] Л (2) Translate all negation inwards until they reach predicates: Ѵт[Р(т) V Эг (Р(т, у)    ^Я( (т), у)    ^Е(х, г ))] Л ЗпР(г) (3) Rename variables, with unique name for each quantifier: Ѵт[Р(т) V Эѵ(Р(т, v) Л у) Л у))] Л Эг^Р(г) (4) Eliminate existential quantifiers (skolemize) For 3y within a quantifier Vx, create a y = g(x) (the value of у depends in general on the value of x) Otherwise, choose a new Ѵт[Р(т) V (P(^, ѵИ) Л SW) Л ^E(z, avoid forgetting to change it somewhere MAX 100 ( ) { [MAX] = {2}; ( = 1, n = 3; cnt = sqrt(n); ( = 0; n 70 p [ j ] ; ++j ) CpEj] >= maxdiv) { p[cnt++] = n; ; ( = 0; j lots of room for As function argument, the of the array is passed carries => typically, length is given as another parameter in parameter declaration, does not matter only confuses reader neither compiler nor runtime can check or know length! ( [], ) ( = 0; i ( ) { printf( , ++x); ( ) { = 5; inc(y); printf( , y); 0; inc O is NOT called with variable y it is called with it doesn’t know it was called as inc(y) We cannot make it know Parameter x can be assigned, but its lifetime is the function block Values are NOT passed back through parameters what would you do returning from inc(y*y+y+7) ??? But: having address, a function may For arrays: and array elements ( [], [], [], size t ) { ( = 0; i = 5) { s += a[i] ; ++num; num ? s   num : 0; Division by 0 would return NAN (not a number, math h) we return a value (0) distinct from any normal result (> 5) { ( (a[i] 1; } ( [], = len; i—;) 5) 0; Which is the smallest prime factor of n NP 11 [NP] = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31}; ( ) = 751, p = n; ( = 0; i ( ) = {0}; ((c = getcharO) != EOF) ++frq[c] ; ( =0; car ( , ) { [n] ; memset(seen, 0, (seen)); printf( , m n); (; m 70= n; m *= 10) { putchar(10*m n + ); (seen[m]) { printf( ); ; } seen[m] = 1; putchar( ); ( ) { (5, 28); 0; word ; name = { , , }; in C, are character sequences terminated in memory by the ’ 0’ character (nuli character, code 0) also end with terminator uses one extra memory byte but is not counted as part of string length (strlen) msg[] = ; msg[] = { , , , , str = ; For initialized strings without explicit dimension (msg above), allocated size is that of initializer, plus All for strings need strings Strings are character arrays terminated by This is how we can find out their length Not so for other arrays We must pass the length to any function ( [], ); For strings, just pass the string, and traverse until ( S [] ) { ( = 0; s[i]; ++i) { ( s[]) { = 0; (s[ij) ++i; i; DON’T write f er—(int i=0;—i ( ) { ; printf( printf( , &d); , a) ; 0; The result of an address operation has a , like any expression For a variable read: sometype x; type of address &x is ; i e , an address of an object of that type ; type of &n is c; type of &c is * say: pointer, pointer to integer * say: pointer, pointer to character in tab[LEN] ; the tab has type int a ; a has type char *t ; t has type A function declaration means (becomes) ( ( *s) s П) or ( П) or ( * ) etc the ( [ ]) The value (0 of type must know all dimensions except first: Ацпхю x Вюхб = G nx6 ( [] , [] , [] , ) ( = 0; i lengths as parameters, arrays that use them: ( [m] [p]); [m] [n], [n] [p] , Overflow by not checking loop limit: , a ; printf( ); scanf( , &n); ( = 0; i allows guaranteed (certified) results within modeling assumptions (compiler, libraries, OS, hardware ) verification conditions (from Floyd Hoare rules) provers or satisfiability checkers (SAT-solvers) may need human hints   annotations for complex cases intense interaction with human expert system = finite-state automaton algorithm = explore state space (graph traversai) automated; gives counterexample in case of error challenge: state space explosion developed from 1981 (Clarke &t Emerson; Sifakis - Turing award 2007) initially applied to hardware and small concurrent programs Example: Peterson’s mutual exclusion algorithm (1) { (1) { Li: flag = true; Rl: flag[l] = true; L2: turn = 1; R2: turn = 0; L3: (flag[l] && turn==l) R3: (flag && turn==0) СО: flag = false; Ci: flag[l] = false; Can programs simultaneously reach criticai section ? labels C0 and Ci, before setting to false (freeing resource) -fi 1 State space: variables: 3 bits: fo, fi, t, initially (?, ?, ?) program counters (2 threads) => cartesian product: pairs (pco,pci) Explicit representation: 23 • 5 • 5 States Not all States are (feasible) Can we reach state with pc0 = Co, pd = Ci? Answer: explore state space forward, from initial state (  i,   i, ?, ?, ?) is bad state reachable? or backward, from error state (Co, Ci, ?, ?, ?) is initial state reachable? A implements traversai algorithms also for more complex properties {temporal logic) Simplest property: - is error state reachable ? We know this from graph traversai (BFS, DFS) but there, the graph is explicit and pre-build must only follow pointers from node to node Model checking usually starts from a model description in text (program) C, Java, dedicated specification modeling language No pre-existing graph of nodes, model must be built e g explicit-state, on-the-fly state-space exploration or : state sets and transition relation are formulas represented as binary decision diagrams (BDDs) may need to compose models (automata) for components State sets are formulas over state variables: Sj = (pco = 1) A (pci = 1) (initial) fo, fi, t arbitrary => 8 individual States transition: formula over state and next state pc0 = 1 A pc'Q = 2 A % = 1 Л pc( = pci A t' = t A f{ = fi Transition relation: disjunction (V) of all transitions Next state set: all States s' such that s G Si A step(s, sz) i e , S (s) A step(s, s') A path of length к from initial state set S, to target state (set) Sf must satisfy S (so) A step(so, si) A A step^s^, sk) К Sf(sk) This means of a Boolean formula NP-complete, but efficient algorithms in recent practice if one can’t explore the full state space, show that no error paths of length less than some к exist Early: SPiN tool (own modeling language with guarded commands) SLAM project [Microsoft Research] (starting 2000) (Software (Specifications), Languages, Analysis and Model checking) later, many others: BLAST (UC Berkeley), CBMC (Oxford), today: Software Verification Competition (5th edition, 2016) Goal: checking (invariants) example: a program respects APi usage rules calls to lockO and unlockO alternate used in practice for device drivers in Windows, Linux focused mostly on finding control interface errors Advantages: - no need to annotate program by user (only specify rules to monitor - simple automata) - checking is automatic, for possible executions - generates (concrete execution) in case of error do {    Device driver fragment [Ball & Rajamani ’Ol] request = devExt->WriteListHeadVa; if(request && request->status) { devExt->WriteListHeadVa = request->Next; irp = request->irp; if (request->status > 0) { irp->ioStatus Status = STATUS SUCCESS; irp->ioStatus information = request->Status; } else { irp->ioStatus Status = STATUSJJNSUCCESSFUL; irp->ioStatus information = request->Status; } SmartDevFreeBlock(request); loCompleteRequest(irp, iO NO iNCREMENT); } } Only highlighted code is relevant for correciness! А lock may be represented as one bit: acquire and release change the bit value or signal error state { emim { Unlocked=0, Locked=l } state = Unlocked; } KeAcquireSpinLock return { if (state == Locked) abort; else state = Locked; } KeReleaseSpinLock return { if (state == Unlocked) abort; else state = Unlocked; } Given this lock model, the program is automatically instrumented (original program is correct iff instrumented program can’t reach error) Programs may be very complex Many statements may be irrelevant for property of interest => want to focus on relevant program part [Weiser, 1981] determines program fragment (slice) that affects a given property (slicing criterion) (e g value of a variable in a program point) More generally: generate a simplified program (model) from whose analysis we derive properties of the initial program = boolean condition (expression with program variables) Starts from the predicates in the specification nondeterministic branches skip (NOP) for irrelevant statements initially, keep just , without data do { A: skip; if(*) { B: if (*) { skip; } else { skip; } } } C: Abstract program is automaton: calculate reachable state set state = program counter + variable assignment state space: represented efficiently as boolean formula (binary decision diagram, BDD) computing with state sets: captures correlations between variables transition relation: is also a boolean formula state = 0 A state' = 1 For given program, model checker finds error trace: may traverse A: KeAcquireSpinLockO twice successively if one never enters the if containing B: Release We get an error trace in the abstract program (model) is it feasible in the original (concrete) program ? Map error trace onto original program = find input values that satisfy constraints for the chosen path (weakest preconditions) if counterexample (error trace) is feasible, it is a real error if counterexample is not feasible, abstraction was too coarse model myst be refined and re-checked in the given example, reproducing the counterexample fails program exits while after first loop => the loop condition is for the analyzed property We introduce a new (boolean variable) representing the condition def b := nPackets != nPacketsOld We generate a new boolean program =^- find statements depending on b Assignments nPacketsOld = nPackets and nPackets++ afFect b We determine when after an assignment we know the value of b (true false) depending on all state bits (2n for n predicates, here 1) Find weakest precondition for b, resp !b after given assignment We use for short nP and nPO We find wp for b: wp-j- = wp(nP wpr and if !b —>wp- -nP=nP0 nP+l=nP0 and nP^nPO nP+l=nP0 So regardless of b we can’t be sure that after nP++, b will be true We repeat with wp^ = wp(nP nP+l^nPO and nP^nPO nP+l^nPO So if b then after nP++ we have !b, else we don’t know =^- we may abstract nP++ with b = b ? F : nondet Likewise, we may abstract nPO = nP with b = T Regenerate boolean program with the new predicates, check again do { А:  * b == (nPackets == nPacketsOld) *  if(*) { B: if (*) { skip; } else { skip; }    choose(pl, p2) == pl ? T : p2 ? F : nondet } } C: The new abstraction is fine-grained enough Exploring all boolean program States the model-checker does not find an error path after B:Release, b becomes F, we stay in the cycle, can’t execute C:Release again (we do A:Acquire) if we don’t pass B:Release, b stays T, we exit the cycle, can’t repeat A:Acquire (we do C:Release) May need several abstraction steps; termination not guaranteed in practice, model checking is feasible for control-rich programs: errors in drivers, Linux kernel, etc November 10, 2005 - Propositional calculus - Predicate calculus - Decision procedures - Resolution theorem proving Formal Verification Lecture 6 Marius Minea Elements of Mathematical Logic 2 of propositional logic: -i and —and parantheses ( ) of propositional logic: - any atomic proposition is a formula - if a is a formula, then (-"a) is a formula - if a and  3 are formulas, then (a —>  3) is a formula Other known operators can be introduced as shorthands: - (оЛЙ) = (^(a^(^))) - (oVA) = ((-a)- 3) - (a  3) *=" ((a —>  3) Л ( 3 —> a)) Simplified notation: without redundant parantheses; precedence order defined as: л,Ѵ,is right-associative Formal Verification Lecture 6 Marius Minea Elements of Mathematical Logic 3 A valuation v is a function defined for all propositional formulas, with values in {T, F} such that: - is defined for any atomic proposition p An = a valuation for the atomic propositions of a formula An intrepretation satisfies a formula if the latter is evaluated to T (we say that the interpretation is a for that formula) formula ( ): true in all interpretations formula: true in at least one interpretation unsatisfiable formula ( ): false in any interpretation Formal Verification Lecture 6 Marius Minea Elements of Mathematical Logic 4 approach: based on (logical truth) A set of formulas H implies a formula p if any truth function that satisfies H (i e , all formulas in Я) also satisfies p - based on syntactic manipulation of formulas: is a theorem provable from a set of , using Formal Verification Lecture 6 Marius Minea Elements of Mathematical Logic 5 for propositional logic: Al: (a —> ( 3 —> a)) A2: ((a (J37)) ((a —  3) — (a — 7))) A3: (((^ 3) (-a)) (((^ 3) - a) -  3)) (called schemes (schemata) because axioms are obtained substituting particular formulas of propositional logic) We introduce a single deduction rule ( From the formulas p and p w we can deduce ) (2) )) (( ( ) ( )) (3) ( ^ ( ^ ) (4) ( - the quantifier V (universal quantifier) - a set of identifiers vqti, •   • for - a (possibly empty) set of symbols for - for any n > 1 a set of n-ary symbols (of n arguments) - for any n > 1 a set of n-ary (relations) First-order languages with equality: contain = as special symbol in addition to the above Formal Verification Lecture 6 Marius Minea Elements of Mathematical Logic 10 of a first-order language (defined by structural induction) - any variable symbol vn - any constant symbol c -  (*1, - - - ,in), if f is an n-ary function symbol and are terms of a first-order language: - P(ti,       , in), where P is an n-ary predicate and ti,       ,in are terms - ti = t2, where ti and І2 are terms (for languages with equality) - -ia, where a is a formula - a —>  3, where a,(3 are formulas -   vnp where vn is a variable and p is a formula Formal Verification Lecture 6 Marius Minea Elements of Mathematical Logic 11 An ( ) i for the predicate language   consists of: - a nonempty set U called the or the domain of i (the set of values which the variables can take) - for any constant Symbol c, a value q eU - for any n-ary function Symbol f, a function f : Un U - for any n-ary predicate Symbol F, a subset Fz C Un Let i be an interpretation with universe U for  , and let V be the set of all variable symbols from   A is a function s : V U Extending the valuation s to terms and formulas we obtain a truth function (valuation) for all formulas in   We write i |= s(p) or i |= ( 3 —> a)) A2: ((a (J3 7)) ((a (3) (a 7))) A3: (((^ 3) (-a)) (((^ 3) -+ a) -+  3)) A4: (Vx'(n:  3) (Уха Чх ЗУ) A5: ( fxa a[x avoid forgetting to change it somewhere MAX 100 ( ) { [MAX] = {2}; ( = 1, n = 3; cnt = sqrt(n); ( = 0; n 70 p [ j ] ; ++j ) CpEj] >= maxdiv) { p[cnt++] = n; ; ( = 0; j lots of room for As function argument, the of the array is passed carries => typically, length is given as another parameter in parameter declaration, does not matter only confuses reader neither compiler nor runtime can check or know length! ( [], ) ( = 0; i ( ) { printf( , ++x); ( ) { = 5; inc(y); printf( , y); 0; inc O is NOT called with variable y it is called with it doesn’t know it was called as inc(y) We cannot make it know Parameter x can be assigned, but its lifetime is the function block Values are NOT passed back through parameters what would you do returning from inc(y*y+y+7) ??? But: having address, a function may For arrays: and array elements ( [], [], [], size t ) { ( = 0; i = 5) { s += a[i] ; ++num; num ? s   num : 0; Division by 0 would return NAN (not a number, math h) we return a value (0) distinct from any normal result (> 5) { ( (a[i] 1; } ( [], = len; i—;) 5) 0; Which is the smallest prime factor of n NP 11 [NP] = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31}; ( ) = 751, p = n; ( = 0; i ( ) = {0}; ((c = getcharO) != EOF) ++frq[c] ; ( =0; car ( , ) { [n] ; memset(seen, 0, (seen)); printf( , m n); (; m 70= n; m *= 10) { putchar(10*m n + ); (seen[m]) { printf( ); ; } seen[m] = 1; putchar( ); ( ) { (5, 28); 0; word ; name = { , , }; in C, are character sequences terminated in memory by the ’ 0’ character (nuli character, code 0) also end with terminator uses one extra memory byte but is not counted as part of string length (strlen) msg[] = ; msg[] = { , , , , str = ; For initialized strings without explicit dimension (msg above), allocated size is that of initializer, plus All for strings need strings Strings are character arrays terminated by This is how we can find out their length Not so for other arrays We must pass the length to any function ( [], ); For strings, just pass the string, and traverse until ( S [] ) { ( = 0; s[i]; ++i) { ( s[]) { = 0; (s[i]) ++i; i; DON’T write f er—(int i=0;—i ( ) { ; printf( printf( , &d); , a) ; 0; The result of an address operation has a , like any expression For a variable read: sometype x; type of address &x is ; i e , an address of an object of that type in tab[LEN] ; the tab has type int a ; a has type char *t ; t has type the The value indicates an required, but there is no valid address) rtyp f (eltyp a[ ]) (0 of type , address of unspecified type) (used when an address value is , it is а including string : "something" ’a’ is a char, but "a" is a string (char *) A string (constant or not) is null-terminated (’ 0’) Functions that work with strings can thus know where strings end (no need for an extra length parameter) BUT: to compute string length must look at all chars (expensive) Compare strings with strcmp, strncmp, NOT with == == compares addresses (WHERE strings are), NOT their contents BUT: could use singleton strings for efficient comparison a string "test" CANNOT be modified (do not pass it to a function that modifies its argument) size t ( *s); *strchr( *s, *strstr( *big, ( *sl, ( *sl, *strcpy( *strncpy( *dest, *dest, *strcat( *dest, *strncat( *dest, *small); *s2) ; *s2, size t ); *src) ; *src, size t ); *src); *src, size t ); : unsigned integer type for sizes of objects : type qualifier: object will not be changed * ( * , , size t ); * ( * , * , size t ); * ( * , * , size t ) size t ( *s, *accept); size t ( *s, *reject); Arrays with elements that are themselves arrays (matrix lines) Declaration: type name[d mi][d m2] [d m   ]; Example: double m ; int a ; m: array of 6 elements, each an array of 8 reals Addressing an element: m Dimensions: (since C99: known at declaration point) Array elements are consecutive in memory m[i] [j] is in position i*C0L+j LiN 2 COL 5 ( ) { [LiN][COL] = { {O, 1, 2 ( = 0; i must know all dimensions except first: Ацпхю x Вюхб = G nx6 ( [] , [] , [] , ) ( = 0; i lengths as parameters, arrays that use them: ( [m] [p]); [m] [n], [n] [p] , Overflow by not checking loop limit: , a ; printf( ); scanf( , &n); ( = 0; i allows guaranteed (certified) results within modeling assumptions (compiler, libraries, OS, hardware ) verification conditions (from Floyd Hoare rules) provers or satisfiability checkers (SAT-solvers) may need human hints   annotations for complex cases intense interaction with human expert system = finite-state automaton algorithm = explore state space (graph traversai) automated; gives counterexample in case of error challenge: state space explosion developed from 1981 (Clarke &t Emerson; Sifakis - Turing award 2007) initially applied to hardware and small concurrent programs Example: Peterson’s mutual exclusion algorithm (1) { (1) { Li: flag = true; Rl: flag[l] = true; L2: turn = 1; R2: turn = 0; L3: (flag[l] && turn==l) R3: (flag && turn==0) СО: flag = false; Ci: flag[l] = false; Can programs simultaneously reach criticai section ? labels C0 and Ci, before setting to false (freeing resource) -fi 1 State space: variables: 3 bits: fo, fi, t, initially (?, ?, ?) program counters (2 threads) => cartesian product: pairs (pco,pci) Explicit representation: 23 • 5 • 5 States Not all States are (feasible) Can we reach state with pc0 = Co, pd = Ci? Answer: explore state space forward, from initial state (  i,   i, ?, ?, ?) is bad state reachable? or backward, from error state (Co, Ci, ?, ?, ?) is initial state reachable? A implements traversai algorithms also for more complex properties {temporal logic) Simplest property: - is error state reachable ? We know this from graph traversai (BFS, DFS) but there, the graph is explicit and pre-build must only follow pointers from node to node Model checking usually starts from a model description in text (program) C, Java, dedicated specification modeling language No pre-existing graph of nodes, model must be built e g explicit-state, on-the-fly state-space exploration or : state sets and transition relation are formulas represented as binary decision diagrams (BDDs) may need to compose models (automata) for components State sets are formulas over state variables: Sj = (pco = 1) A (pci = 1) (initial) fo, fi, t arbitrary => 8 individual States transition: formula over state and next state pc0 = 1 A pc'Q = 2 A % = 1 Л pc( = pci A t' = t A f{ = fi Transition relation: disjunction (V) of all transitions Next state set: all States s' such that s G Si A step(s, sz) i e , S (s) A step(s, s') A path of length к from initial state set S, to target state (set) Sf must satisfy S (so) A step(so, si) A A step^s^, sk) К Sf(sk) This means of a Boolean formula NP-complete, but efficient algorithms in recent practice if one can’t explore the full state space, show that no error paths of length less than some к exist Early: SPiN tool (own modeling language with guarded commands) SLAM project [Microsoft Research] (starting 2000) (Software (Specifications), Languages, Analysis and Model checking) later, many others: BLAST (UC Berkeley), CBMC (Oxford), today: Software Verification Competition (5th edition, 2016) Goal: checking (invariants) example: a program respects APi usage rules calls to lockO and unlockO alternate used in practice for device drivers in Windows, Linux focused mostly on finding control interface errors Advantages: - no need to annotate program by user (only specify rules to monitor - simple automata) - checking is automatic, for possible executions - generates (concrete execution) in case of error do {    Device driver fragment [Ball & Rajamani ’Ol] request = devExt->WriteListHeadVa; if(request && request->status) { devExt->WriteListHeadVa = request->Next; irp = request->irp; if (request->status > 0) { irp->ioStatus Status = STATUS SUCCESS; irp->ioStatus information = request->Status; } else { irp->ioStatus Status = STATUSJJNSUCCESSFUL; irp->ioStatus information = request->Status; } SmartDevFreeBlock(request); loCompleteRequest(irp, iO NO iNCREMENT); } } Only highlighted code is relevant for correciness! А lock may be represented as one bit: acquire and release change the bit value or signal error state { emim { Unlocked=0, Locked=l } state = Unlocked; } KeAcquireSpinLock return { if (state == Locked) abort; else state = Locked; } KeReleaseSpinLock return { if (state == Unlocked) abort; else state = Unlocked; } Given this lock model, the program is automatically instrumented (original program is correct iff instrumented program can’t reach error) Programs may be very complex Many statements may be irrelevant for property of interest => want to focus on relevant program part [Weiser, 1981] determines program fragment (slice) that affects a given property (slicing criterion) (e g value of a variable in a program point) More generally: generate a simplified program (model) from whose analysis we derive properties of the initial program = boolean condition (expression with program variables) Starts from the predicates in the specification nondeterministic branches skip (NOP) for irrelevant statements initially, keep just , without data do { A: skip; if(*) { B: if (*) { skip; } else { skip; } } } C: Abstract program is automaton: calculate reachable state set state = program counter + variable assignment state space: represented efficiently as boolean formula (binary decision diagram, BDD) computing with state sets: captures correlations between variables transition relation: is also a boolean formula state = 0 A state' = 1 For given program, model checker finds error trace: may traverse A: KeAcquireSpinLockO twice successively if one never enters the if containing B: Release We get an error trace in the abstract program (model) is it feasible in the original (concrete) program ? Map error trace onto original program = find input values that satisfy constraints for the chosen path (weakest preconditions) if counterexample (error trace) is feasible, it is a real error if counterexample is not feasible, abstraction was too coarse model myst be refined and re-checked in the given example, reproducing the counterexample fails program exits while after first loop => the loop condition is for the analyzed property We introduce a new (boolean variable) representing the condition def b := nPackets != nPacketsOld We generate a new boolean program =^- find statements depending on b Assignments nPacketsOld = nPackets and nPackets++ afFect b We determine when after an assignment we know the value of b (true false) depending on all state bits (2n for n predicates, here 1) Find weakest precondition for b, resp !b after given assignment We use for short nP and nPO We find wp for b: wp-j- = wp(nP wpr and if !b —>wp- -nP=nP0 nP+l=nP0 and nP^nPO nP+l=nP0 So regardless of b we can’t be sure that after nP++, b will be true We repeat with wp^ = wp(nP nP+l^nPO and nP^nPO nP+l^nPO So if b then after nP++ we have !b, else we don’t know =^- we may abstract nP++ with b = b ? F : nondet Likewise, we may abstract nPO = nP with b = T Regenerate boolean program with the new predicates, check again do { А:  * b == (nPackets == nPacketsOld) *  if(*) { B: if (*) { skip; } else { skip; }    choose(pl, p2) == pl ? T : p2 ? F : nondet } } C: The new abstraction is fine-grained enough Exploring all boolean program States the model-checker does not find an error path after B:Release, b becomes F, we stay in the cycle, can’t execute C:Release again (we do A:Acquire) if we don’t pass B:Release, b stays T, we exit the cycle, can’t repeat A:Acquire (we do C:Release) May need several abstraction steps; termination not guaranteed in practice, model checking is feasible for control-rich programs: errors in drivers, Linux kernel, etc Marius Minea marius@cs upt ro 15 November 2017 String of vulnerabilities over past 30 years still ongoing new protection Solutions every year Reason: lack of memory and type safety in low-level languages main culprits: C and C++ (unsafe inputs, pointer arithmetic, unsafe casts) A classic paper: Aleph One, Smashing the stack for fun and profit Phrack magazine 7(49), 1997 void func (char *str) { char buffer ; int variable a; strcpy (buffer, str); int main() { char *str = "i am greater than 12 bytes"; func (str); (b) Active Stack Frame in func() (a) A code example http:  www cis syr edu  wedu seed Labs 12 04 Software Buffer OverflowZBuffer Overflow pdf buffer[O] e bufferfl] V buffer i buffer 1 4 buffer p bufferjll] a prev frame ptr У str (fct arg ) o nxt stack frame a d 4 return address slot overwritten on function return, execution jumps wherever that points to For exploit, must know: 1) position of return address slot to buffer start: i e , buffer size and stack layout (calling convention) 2) memory address of buffer (to fiii in proper payload address) (a) Jump to the malicious code (b) improve the chance http:  www cis syr edu  wedu seed Labs 12 04 Software Buffer OverflowZBuffer Overflow pdf Let’s revisit exploit assumptions: can determine to inject payload ( ) can return address tampering is can payload code Variants: a) overwrite base pointer rather than return address returns into attacker-crafted stack frame => then into exploit b) overwrite C++ exception handling pointers (stored on stack), and cause exception Option 1: detect change check function return if RET address altered Two basic ideas: Check return address itself => need сору of correct value Check bytes next to (before) ret address => terminator canary: 0, CR, LF, EOF random canary (created at process startup time) don’t know => can’t put back) random XOR canary (XOR’ed with protected control data - if it changes, canary will be wrong) Checks inserted by compiler where needed Option 2: hamper execution Attacker must execute injected code: Non-executable stack   write XOR execute (operati ng system support) Attacker must know to jump to: Address Space Layout Randomization good, but ineffective against brute force Typical attack is to caii exec or some other library function instead of executing code (caii exec), put address (and parameters) of libc function on stack, http:  geekscomputer blogspot ro 2008 12 buffer-overflows html Can chain calls - put multiple library addresses on stack Function pointers (denote code) pointers from longjmp pointers to user functions pointers to library functions (PLT: procedure linkage table) pointers to virtual method (C++ vtable) or usual pointers to data Attacks has two steps: a buffer overflow overwrites a pointer (to desired address) in later code, this is used to overwrite criticai area ret address, PLT, etc 